2013년 11월 2일 토요일

Linux 3.12

Linux 3.12



Summary: This release adds support for offline deduplication in Btrfs, automatic GPU switching in laptops with dual GPUs, a performance boost for AMD Radeon graphics, better RAID-5 multicore performance, improved handling of out-of-memory situations, improved VFS path name resolution scalability, improvements to the timerless multitasking mode, separate modesetting and rendering device nodes in the graphics DRM layer, improved locking performance for virtualized guests, XFS directory recursion scalability improvements, IPC scalability improvements, tty layer locking improvements, new drivers and many small improvements.

  1. Prominent features
    1. Offline data deduplication support in Btrfs
    2. Graphic performance boost for AMD Radeon hardware
    3. Automatic GPU switching in laptops with dual GPUs
    4. Separate devices nodes for graphics mode setting and rendering
    5. Improved timerless multitasking: allow timekeeping CPU go idle
    6. RAID5 multithreading
    7. Improved locking performance for virtualized guests
    8. New lockref locking scheme, VFS locking improvements
    9. Better Out-Of-Memory handling
    10. XFS directory recursion scalability, namespace support
    11. Improved tty layer locking
    12. IPC locking improvements
  2. Drivers and architectures
  3. Core
  4. Memory management
  5. Block layer
  6. File systems
  7. Networking
  8. Crypto
  9. Virtualization
  10. Security
  11. Tracing/perf
  12. Other news sites that track the changes of this release

1. Prominent features


1.1. Offline data deduplication support in Btrfs


The Btrfs filesystem has gained support for offline data deduplication. Deduplication consists in removing copies of repeated data in the filesystem, since the data is the same only one copy is necessary. In some particular workloads, like virtualization VMs -which often contain similar copies of operating systems- or backups the gains can be enormous. By "offline", it means that the deduplication process is done when the file system is mounted and running, but it's not done automatically and transparently as processes write data, but rather it's triggered by userspace software at a time controlled by the system administrator. Online deduplication will be added future releases.

The bedup deduplication tool has a branch that works against this support. The branch can be found here.

The author of the deduplication support has also written an sample deduplication tool, duperemove, which can be found here.

Code: commit

1.2. Graphic performance boost for AMD Radeon hardware


The website Phoronix.com found that graphic performance in modern AMD Radeon GPUs had improved a lot in Linux 3.12. However, there hasn't been any important modification on in the Radeon driver that can cause such massive gains. After some investigation, Phoronix found out that the responsible change for this boost wasn't a change in the Radeon driver itself, but a change to the algorithms in the cpufreq ondemand governor. Apparently, the ondemand governor was oscillating too much between frequencies, and this oscillation harmed graphic performance for Radeon GPUs. The new frequency algorithm eliminates this problem.

Code: commit

1.3. Automatic GPU switching in laptops with dual GPUs


Some laptop hardware, like Nvidia Optimus, have two GPUs, one optimized for performance and other for power saving. Until now, some hacks have been needed to switch between these GPUs. In this release, the driver handles the switch automatically

Code: commit 123

1.4. Separate devices nodes for graphics mode setting and rendering


Recent hardware development (especially on ARM) shows that rendering (via GPU) and mode-setting (via display-controller) are not necessarily bound to the same graphics device. This release incorporates in the graphics layer support for separate device nodes for mode setting and rendering. The main usage is to allow different access-modes for graphics-compositors (which require the modeset API) and client-side rendering or GPGPU-users (which both require the rendering API).

For more information, see this blog post: Splitting DRM and KMS device nodes

Code: commit 1234

1.5. Improved timerless multitasking: allow timekeeping CPU go idle


Linux 3.10 added support for timerless multitasking, that is, the ability to run processes without needing to fire up the timer interrupt that is traditionally used to implement multitasking. This support, however, had a caveat: it could turn off interrupts in all CPUs, except one that is used to track timer information for the other CPUs. But that CPU keeps the timer turned on even if all the CPUs are idle, which was useless. This release allows to disable the timer for the timekeeping CPU when all CPUs are idle.

Recommended LWN article: Is the whole system idle?

Code: commit 12345678

1.6. RAID5 multithreading


This release attempts to spread the work needed to handle raid 5 stripes to multiple CPUs in the MD ("multiple devices") layer, which allows more IO/sec on fast (SSD) devices.

Code: commit12

1.7. Improved locking performance for virtualized guests


The operating system that runs in each virtualized guest also runs its own locks. With some locks, like spinning locks, this causes problems when many guests are present and keep spinning and wasting host CPU time and other problems. This release replaces paravirtualized spinlocks with paravirtualized ticket spinlocks, which have better performance properties for virtualized guests and brings speedups on various benchmarks.


Code: commit 1234

1.8. New lockref locking scheme, VFS locking improvements


This release adds a new locking scheme, called "lockref". The "lockref" structure is a combination "spinlock and reference count" that allows optimized reference count accesses. In particular, it guarantees that the reference count will be updated as if the spinlock was held, but using atomic accesses that cover both the reference count and the spinlock words, it can often do the update without actually having to take the lock. This allows to avoid the nastiest cases of spinlock contention on large machines. When updating the reference counts on a large system, it will still end up with the cache line bouncing around, but that's much less noticeable than actually having to spin waiting for the lock. This release already uses lockref to improve the scalability of heavy pathname lookup in large systems.

Recommended LWN article: Introducing lockrefs

Code: commit 123456

1.9. Better Out-Of-Memory handling


The Out-Of-Memory state happens when the computer runs out of RAM and swap memory. When Linux gets into this state, it kills a process in order to free memory. This release includes important changes to how the Out-Of-Memory states are handled, the number of out of memory errors sent to userspace and reliability. For more details see the below link.

Recommended LWN article: Reliable out-of-memory handling

Code: commit 1234567

1.10. XFS directory recursion scalability, namespace support


XFS has added support for a directory entry file type, the purpose is that readdir can return the type of the inode the dirent points to userspace without first having to read the inode off disk. Performance of directory recursion is much improved. Parallel walk of ~50 million directory entries across hundreds of directories improves significantly, from roughly 500 getdents() calls per second and 250,000 inode lookups per second to determine the inode type at roughly 17,000 read IOPS to 3500 getdents() calls per second at 16,000 IOPS, with no inode lookups at all.

This release has also added XFS support for namespaces, and has reincorporated defragmentation support for the new CRC filesystem format.

Code: commit 12345

1.11. Improved tty layer locking


The tty layer locking got cleaned up and in the process a lot of locking became per-tty, which actually shows up on some odd loads.

Commits: merge commit

1.12. IPC locking improvements


This release includes improvements on the amount of contention we impose on the ipc lock (kern_ipc_perm.lock). These changes mostly deal with shared memory, previous work has already been done for semaphores in 3.10 and message queues in 3.11.

With these chanves, a custom shm microbenchmark stressing shmctl doing IPC_STAT with 4 threads a million times, reduces the execution time by 50%. A similar run, this time with IPC_SET, reduces the execution time from 3 mins and 35 secs to 27 seconds.

Code: commit2345678910

2. Drivers and architectures


All the driver and architecture-specific changes can be found in the Linux_3.12-DriversArch page

3. Core


  • task scheduler: Implement smarter wake-affine logic commit
  • seqlock: Add a new locking reader type commit
  • idr: Percpu ida commit
  • initmpfs: use initramfs if rootfstype= or root= specified commit
  • Lock in place mounts from more privileged users commit
  • sysfs: Restrict mounting sysfs commit
  • CacheFiles: Implement interface to check cache consistency commitcommit
  • modules: add support for soft module dependencies commitcommit
  • Add support to aio ring pages migration commit
  • Implement generic deferred AIO completions commit

4. Memory management


  • Rework the caching shrinking mechanisms, recommended LWN article: Smarter shrinkerscommit
  • Data writeback: add strictlimit feature. The feature prevents mistrusted filesystems (ie: FUSE mounts created by unprivileged users) to grow a large number of dirty pages before throttling. commit
  • Page allocator: fair zone allocator policy commit
  • Add hugepage node migration support commitcommitcommit
  • Account anon transparent huge pages into NR_ANON_PAGES commit
  • swap: change block allocation algorithm for SSD commit
  • swap: make cluster allocation per-cpu commit
  • swap: make swap discard async commit

5. Block layer


  • Detect hybrid MBRs commit
  • dm cache: add data block size limits. Inform users that the data block size can't be any arbitrary number, i.e. its value must be between 32KB and 1GB. Also, it should be a multiple of 32KB commit

6. File systems


  • Btrfs
    • Limit the size of delayed allocation ranges, which will limit extent sizes to 128 MB commit
    • Allow compressed extents to be merged during defragment commit
    • Add mount option to force UUID tree checking commit
    • Check UUID tree during mount if required commit
    • Create UUID tree if required commit
    • Fill UUID tree initially commit
  • Ext4
    • Add support for extent pre-caching through a new fiemap flag. This is critically important when using AIO to a preallocated filecommitcommit
    • Allow specifying external journal by pathname mount option commit
    • Mark block group as corrupt on block bitmap error commit
    • Mark group corrupt on group descriptor checksum commit
  • Ext3
    • Allow specifying external journal by pathname mount option commit
  • XFS
    • Add support for the Q_XGETQSTATV quotacl command commit
    • Introduce object readahead to log recovery commit
  • F2FS
  • Pstore
    • Add compression support to pstore commit
    • Add decompression support to pstore commit
    • Add file extension to pstore file if compressed commit
  • CEPH
  • HFS+
    • Implement POSIX ACLs support commit
  • NFS
    • Refuse mount attempts with proto=udp commit
  • isofs
    • Refuse RW mount of the filesystem instead of making it RO commit
  • udf
    • Refuse RW mount of the filesystem instead of making it RO commit

7. Networking


  • tcp: TCP_NOTSENT_LOWAT socket option commit
  • tcp: TSO packets automatic sizing commit
  • tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies commit
  • tcp: increase throughput when reordering is high commit
  • tcp: prefer packet timing to TS-ECR for RTT commit
  • tcp: use RTT from SACK for RTO commit
  • ipv6: Add generic UDP Tunnel segmentation commit
  • ipv6: drop fragmented ndisc packets by default (RFC 6980) commit
  • ipv6: mld: implement RFC3810 MLDv2 mode only commit
  • bridge: apply multicast snooping to IPv6 link-local, too commit
  • macvlan fdb replace support commit
  • Devices: export physical port id via sysfs commit
  • igmp: Allow user-space configuration of igmp unsolicited report interval commit
  • tcp_probe: add IPv6 support commit
  • tcp_probe: allow more advanced ingress filtering by mark commit
  • netfilter: add IPv6 SYNPROXY target commit
  • Wireless
    • Use reduced txpower for 5 and 10 MHz commit
    • Add packet coalesce support commit
    • Allow scanning for 5/10 MHz channels in IBSS commit
  • openvswitch
    • Add SCTP support commit
    • Add vxlan tunneling support. commit
    • Mega flow implementation commit
  • pkt_sched: fq: Fair Queue packet scheduler commit
  • pktgen: Add UDPCSUM flag to support UDP checksums commit
  • qdisc: allow setting default queuing discipline commit
  • tun: Add ability to create tun device with given index commit
  • tun: Allow to skip filter on attach commit
  • tun: Support software transmit time stamping. commit
  • tuntap: hardware vlan tx support commit
  • vxlan: Add tx-vlan offload support. commit
  • vxlan: add ipv6 support commit
  • NFC: Add a GET_SE netlink API, which dumps a list of discovered secure elements in an NFC controller commit
  • Infiniband: Add receive flow steering support commit
  • USBNET: Improving USB3 thoughtput commit

8. Crypto


  • omap-sham - Add OMAP5/AM43XX SHAM Support commit
  • omap-sham - Add SHA384 and SHA512 Support commit
  • Add NEON accelerated XOR implementation commit
  • Add OMAP4 random generator support commit

9. Virtualization


  • Adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2 commit
  • vmware: Add support for virtual IOMMU commit, Add support for virtual IOMMU in VMXNET3 commit
  • vfio-pci: PCI hot reset interface commit
  • vfio: add external user support commit
  • xen: Support 64-bit PV guest receiving NMIs commit
  • Add xen tpmfront interface commit

10. Security


  • Apparmor
    • Add an optional profile attachment string for profiles commit
    • Add interface files for profiles and namespaces commit
    • Add the ability to report a sha1 hash of loaded policy commit
    • Add the profile introspection file to interface commit
    • Allow setting any profile into the unconfined state commit
    • Enable users to query whether apparmor is enabled commit

11. Tracing/perf


  • perf
    • Add option to limit stack depth in callchain dumps commit
    • Add option to print stack trace on single line commit
    • diff: Add generic order option for compute sorting commit
    • diff: Update perf diff documentation for multiple data comparison commit
    • gtk/hists: Display callchain overhead also commit
    • kvm stat report: Add option to analyze specific VM commit
    • kvm: Add live mode commitcommit
    • list: Skip unsupported events commit
    • perf report/top: Add option to collapse undesired parts of call graph commit
    • stat: Add support for --initial-delay option commit
    • symbols: Add support for reading from /proc/kcore commit
    • tools: Add 'S' event/group modifier to read sample value commit
    • tools: Default to cpu// for events v5 commit
    • tools: Make it possible to read object code from kernel modules commit
    • tools: Make it possible to read object code from vmlinux commit
    • top: Add --objdump option commit
    • trace: Allow specifying which syscalls to trace commit
    • trace: Implement -o/--output filename commit
    • trace: Support ! in -e expressions commit

12. Other news sites that track the changes of this release


댓글 없음:

댓글 쓰기