You are here: LinuxKernel Web>LinuxVFP (30 May 2008)

VFP Support in Linux

Introduction

This page describes the Linux support for the common VFP subarchitecture specification (both VFPv2 and VFPv3).

For a VFP application to run correctly on Linux, the kernel has to perform the saving and restoring of the VFP registers. In addition, there are exceptional conditions that need to be handled by the support code via the Undefined Instruction trap. This software entry is also known as a bounce. Note that only VFPv2 and VFPv3U? variants can generate floating point exceptions. VFPv3 and NEON do not generate exceptions and the corresponding enable bits in FPSCR should be 0.

The VFP version and features available can be identified using the FPSIDregister and, on ARMv7, the MVFR0 and MVFR1 (Media and VFP Feature) registers. The other VFP registers consist of 16 or 32 (if Advanced SIMD instructions are supported) double-precision registers sharing the same location with 32 single-precision registers, FPEXC(Floating Point Exception), FPSCR(Floating Point Status and Control), FPINSTand FPINST2(Floating Point trigger Instructions, optional, depending on VFP implementation).

For additional information on VFP, see A2.6, B1.8, B5.3 and Appendix B in the latest ARM Architecture Reference Manual.

Code overview

Most of the kernel VFP support code resides in the arch/arm/vfp/ directory:

arch/arm/vfp/entry.S
Basic entry code, called from the Undefined Instruction trap. This code redirects the VFP-specific undefined exception to the VFP support code.
arch/arm/vfp/vfphw.S
Low-level VFP entry code. This code handles the VFP registers saving and restoring, prepares the call parameters for the floating point support code (bounce) and handles the return to the user code that triggered the exception.
arch/arm/vfp/vfpmodule.c
VFP initialisation, entry bounce code, context switching (vfp_notifier) and user signal raising (SIGFPE)
arch/arm/vfp/vfpsingle.c
Single-precision floating point support code for the IEEE754 compliance.
arch/arm/vfp/vfpdouble.c
Double-precision floating point support code for the IEEE754 compliance.
arch/arm/vfp/vfp.h
Various helper functions for floating point representation.
arch/arm/vfp/vfpinstr.h
Function prototypes and inline assembly for accessing the coprocessor registers.
include/asm-arm/vfp.h
Definitions for VFP registers and bit masks.
include/asm-arm/vfpmacros.h
Assembler macros for backwards compatibility with older toolchains that do not understand all the VFP instructions.

In addition to the above, VFP-related pieces of code can be found in the files below:

arch/arm/kernel/entry-armv.S
The Undefined Instruction trap checks for VFP instructions and calls do_vfp.
include/asm-arm/fpstate.h
Defines the vfp_hard_structstructure containing space for the VFP register saving

Initialisation

Initially, the vfp_vectorvariable defined in the vfpmodule.c file points to vfp_null_entry_point. The vfp_initfunction checks for the presence of the VFP coprocessor via the vfp_testing_entryand sets the vfp_vectorto vfp_support_entryif available. The function also sets the HWCAP_VFPbit in elf_hwcap, this information being available to glibc via /proc/cpuinfo for FPSCRinitialisation.

On ARMv6 and later architectures, full access to the CP10 and CP11 VFP coprocessors has to be enabled in the CPACR (Coprocessor Access Control) register, operation performed by vfp_enable.

Context switching

VFP registers saving and restoring is done lazily in Linux to avoid unnecessary operations when an application does not use the VFP.

The vfp_initfunction registers vfp_notifierto be called during a context switch via the __switch_toand atomic_notifier_call_chainfunctions. This function disables the VFP by clearing the FPEXC.ENbit, causing any subsequent VFP instruction (except access to the FPEXC, FPSID, MVFR0and MVFR1registers) to trigger an Undefined Instruction exception.

The vfp_support_entryfunction called via the Undefined Instruction trap checks whether the VFP is disabled and a new thread requires access (the address of the current VFP state is stored in last_VFP_contextdefined in vfphw.S). If a new thread requires the VFP, the support entry code saves the current VFP registers state (if any) and restores the state for the new thread. By default, only the double-precision registers, FPEXCand FPSCRhave to be saved or restored. If the FPEXC.EXbit is set, the FPINSTand FPINST2registers have to be saved or restored as well. Once the VFP is enabled by setting the FPEXC.ENbit, the execution resumes and the trigger VFP instruction is restarted.

If following a context switch the execution would return to the same thread, the VFP support entry code checks for any exceptional VFP state that needs addressing (FPEXC.EX bit set) to avoid an exception re-entry when restarting the trigger instruction.

On SMP systems, the threads can migrate to different CPUs and the lazy saving is no longer possible. The vfp_notifierfunction forces the VFP state saving (vfp_save_state in vfphw.S) if the VFP was enabled and used by the previous thread. It also clears the last_VFP_contextvariable in case a thread is executed on the original CPU after a series of migrations.

Floating point exceptions

Floating point exception can be generated as a result of VFP instructions causing arithmetic exceptions or when additional computation is required in software. The latter is needed on some VFP implementations for full IEEE754 compliance.

Depending on the implementation, the exceptions can be generated synchronously or asynchronously. If the former, the VFP instruction that generated the exception (and pointed to by LR-4) is the trigger instruction and the FPINSTor FPINST2registers are not valid (they might not even be present if only synchronous exceptions are implemented). In the latter case, the trigger instruction opcode is read from the FPINSTregister. There can also be an additional VFP instruction stored in FPINST2(only if the FPEXC.FP2Vbit is set).

Asynchronous VFP exception are marked by the FPEXC.EXbit. If this bit is 0 and the FPEXC.DEXbit is set, the generated exception is a synchronous one. Version 1 of the Common VFP subarchitecture has a special behaviour when the FPSCR.IXE(Inexact eXception Enable) bit is set. In this case, the FPEXC.EXbit signals a synchronous exception in the same way as the FPEXC.DEXbit. Note that the VFP9 implementation (VFP subarchitecture 1 on ARM926EJ?-S) has non-standard behaviour when the FPSCR.IXE bit is set. In this particular case, all the VFP instructions are bounced synchronously without setting the FPEXC.EX or FPEXC.DEX bits.

The VFP entry code calls the VFP_bouncefunction which checks the exception type and emulates the trigger instructions (vfp_emulate_instruction). If any arithmetic exceptions are generated by the VFP emulation code, the bounce function raises SIGFPEagainst the current thread.

An application can ignore the arithmetic exception by clearing the corresponding bits (12:8) in the FPSCRregister (see A2.6.4 in ARM ARM). If an application does not require full IEEE754 compliance and the VFP implementation only handles normalised numbers and zeros in hardware, it can disable the software emulation by setting the Flush-to-Zero mode via the FPSCR.FZbit. In this case, all the denormalised numbers are treated as zero.

Floating point emulation

The floating point emulation is handled via the vfp_emulate_instructiondefined in vfpmodule.c. It checks the instruction type and invokes either vfp_single_cpdoor vfp_double_cpdo(defined in vfpsingle.c and vfpdouble.c). These functions decode the trigger instruction and emulate it in software. The result of the emulation is written back to the VFP registers via the vfp_put_{float|double}functions defined in vfphw.S.

Advanced SIMD (NEON) extension

The Advanced SIMD is an optional extension to the ARMv7 architecture provding a combined 64 and 128 bit SIMD (Single Instruction Multiple Data) instruction set for accelerating media and signal processing.

It uses the same registers and coprocessors with the VFP extension, allowing the existing VFP support code to be used for NEON state saving and restoring. The Advanced SIMD instructions do not generate arithmetic exceptions and no software emulation is needed.

There are Advanced SIMD instructions that do not have a coprocessor bit field in their opcode. To handle them properly, the Undefined Instruction trap requires additional checks for these encoding before calling the VFP support code. For more information see Chapter A7, "Advanced SIMD and VFP Instruction Encoding" in the latest ARM ARM.

-- CatalinMarinas - 19 Sep 2007
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback
zenweb1 : 0.08 secs More Info