VFP Support in Linux
Introduction
This page describes the Linux support for the common VFP subarchitecture specification (both VFPv2 and VFPv3).
For a VFP application to run correctly on Linux, the kernel has to perform the saving and restoring of the VFP registers. In addition, there are exceptional conditions that need to be handled by the support code via the Undefined Instruction trap. This software entry is also known as a
bounce. Note that only VFPv2 and
VFPv3U? variants can generate floating point exceptions. VFPv3 and NEON do not generate exceptions and the corresponding enable bits in FPSCR should be 0.
The VFP version and features available can be identified using the
FPSID
register and, on ARMv7, the MVFR0 and MVFR1 (Media and VFP Feature) registers. The other VFP registers consist of 16 or 32 (if Advanced SIMD instructions are supported) double-precision registers sharing the same location with 32 single-precision registers,
FPEXC
(Floating Point Exception),
FPSCR
(Floating Point Status and Control),
FPINST
and
FPINST2
(Floating Point trigger Instructions, optional, depending on VFP implementation).
For additional information on VFP, see A2.6, B1.8, B5.3 and Appendix B in the latest ARM Architecture Reference Manual.
Code overview
Most of the kernel VFP support code resides in the arch/arm/vfp/ directory:
- arch/arm/vfp/entry.S
- Basic entry code, called from the Undefined Instruction trap. This code redirects the VFP-specific undefined exception to the VFP support code.
- arch/arm/vfp/vfphw.S
- Low-level VFP entry code. This code handles the VFP registers saving and restoring, prepares the call parameters for the floating point support code (bounce) and handles the return to the user code that triggered the exception.
- arch/arm/vfp/vfpmodule.c
- VFP initialisation, entry bounce code, context switching (
vfp_notifier
) and user signal raising (SIGFPE
) - arch/arm/vfp/vfpsingle.c
- Single-precision floating point support code for the IEEE754 compliance.
- arch/arm/vfp/vfpdouble.c
- Double-precision floating point support code for the IEEE754 compliance.
- arch/arm/vfp/vfp.h
- Various helper functions for floating point representation.
- arch/arm/vfp/vfpinstr.h
- Function prototypes and inline assembly for accessing the coprocessor registers.
- include/asm-arm/vfp.h
- Definitions for VFP registers and bit masks.
- include/asm-arm/vfpmacros.h
- Assembler macros for backwards compatibility with older toolchains that do not understand all the VFP instructions.
In addition to the above, VFP-related pieces of code can be found in the files below:
- arch/arm/kernel/entry-armv.S
- The Undefined Instruction trap checks for VFP instructions and calls
do_vfp
. - include/asm-arm/fpstate.h
- Defines the
vfp_hard_struct
structure containing space for the VFP register saving
Initialisation
Initially, the
vfp_vector
variable defined in the vfpmodule.c file points to
vfp_null_entry_point
. The
vfp_init
function checks for the presence of the VFP coprocessor via the
vfp_testing_entry
and sets the
vfp_vector
to
vfp_support_entry
if available. The function also sets the
HWCAP_VFP
bit in
elf_hwcap
, this information being available to glibc via /proc/cpuinfo for
FPSCR
initialisation.
On ARMv6 and later architectures, full access to the CP10 and CP11 VFP coprocessors has to be enabled in the CPACR (Coprocessor Access Control) register, operation performed by
vfp_enable
.
Context switching
VFP registers saving and restoring is done lazily in Linux to avoid unnecessary operations when an application does not use the VFP.
The
vfp_init
function registers
vfp_notifier
to be called during a context switch via the
__switch_to
and
atomic_notifier_call_chain
functions. This function disables the VFP by clearing the
FPEXC.EN
bit, causing any subsequent VFP instruction (except access to the
FPEXC
,
FPSID
,
MVFR0
and
MVFR1
registers) to trigger an Undefined Instruction exception.
The
vfp_support_entry
function called via the Undefined Instruction trap checks whether the VFP is disabled and a new thread requires access (the address of the current VFP state is stored in
last_VFP_context
defined in vfphw.S). If a new thread requires the VFP, the support entry code saves the current VFP registers state (if any) and restores the state for the new thread. By default, only the double-precision registers,
FPEXC
and
FPSCR
have to be saved or restored. If the
FPEXC.EX
bit is set, the
FPINST
and
FPINST2
registers have to be saved or restored as well. Once the VFP is enabled by setting the
FPEXC.EN
bit, the execution resumes and the trigger VFP instruction is restarted.
If following a context switch the execution would return to the same thread, the VFP support entry code checks for any exceptional VFP state that needs addressing (
FPEXC.EX
bit set) to avoid an exception re-entry when restarting the trigger instruction.
On SMP systems, the threads can migrate to different CPUs and the lazy saving is no longer possible. The
vfp_notifier
function forces the VFP state saving (
vfp_save_state
in vfphw.S) if the VFP was enabled and used by the previous thread. It also clears the
last_VFP_context
variable in case a thread is executed on the original CPU after a series of migrations.
Floating point exceptions
Floating point exception can be generated as a result of VFP instructions causing arithmetic exceptions or when additional computation is required in software. The latter is needed on some VFP implementations for full IEEE754 compliance.
Depending on the implementation, the exceptions can be generated synchronously or asynchronously. If the former, the VFP instruction that generated the exception (and pointed to by
LR-4
) is the trigger instruction and the
FPINST
or
FPINST2
registers are not valid (they might not even be present if only synchronous exceptions are implemented). In the latter case, the trigger instruction opcode is read from the
FPINST
register. There can also be an additional VFP instruction stored in
FPINST2
(only if the
FPEXC.FP2V
bit is set).
Asynchronous VFP exception are marked by the
FPEXC.EX
bit. If this bit is 0 and the
FPEXC.DEX
bit is set, the generated exception is a synchronous one. Version 1 of the Common VFP subarchitecture has a special behaviour when the
FPSCR.IXE
(Inexact eXception Enable) bit is set. In this case, the
FPEXC.EX
bit signals a synchronous exception in the same way as the
FPEXC.DEX
bit. Note that the VFP9 implementation (VFP subarchitecture 1 on
ARM926EJ?-S) has non-standard behaviour when the FPSCR.IXE bit is set. In this particular case, all the VFP instructions are bounced synchronously without setting the FPEXC.EX or FPEXC.DEX bits.
The VFP entry code calls the
VFP_bounce
function which checks the exception type and emulates the trigger instructions (
vfp_emulate_instruction
). If any arithmetic exceptions are generated by the VFP emulation code, the bounce function raises
SIGFPE
against the current thread.
An application can ignore the arithmetic exception by clearing the corresponding bits (12:8) in the
FPSCR
register (see A2.6.4 in ARM ARM). If an application does not require full IEEE754 compliance and the VFP implementation only handles normalised numbers and zeros in hardware, it can disable the software emulation by setting the Flush-to-Zero mode via the
FPSCR.FZ
bit. In this case, all the denormalised numbers are treated as zero.
Floating point emulation
The floating point emulation is handled via the
vfp_emulate_instruction
defined in vfpmodule.c. It checks the instruction type and invokes either
vfp_single_cpdo
or
vfp_double_cpdo
(defined in vfpsingle.c and vfpdouble.c). These functions decode the trigger instruction and emulate it in software. The result of the emulation is written back to the VFP registers via the
vfp_put_{float|double}
functions defined in vfphw.S.
Advanced SIMD (NEON) extension
The Advanced SIMD is an optional extension to the ARMv7 architecture provding a combined 64 and 128 bit SIMD (Single Instruction Multiple Data) instruction set for accelerating media and signal processing.
It uses the same registers and coprocessors with the VFP extension, allowing the existing VFP support code to be used for NEON state saving and restoring. The Advanced SIMD instructions do not generate arithmetic exceptions and no software emulation is needed.
There are Advanced SIMD instructions that do not have a coprocessor bit field in their opcode. To handle them properly, the Undefined Instruction trap requires additional checks for these encoding before calling the VFP support code. For more information see Chapter A7, "Advanced SIMD and VFP Instruction Encoding" in the latest ARM ARM.
--
CatalinMarinas - 19 Sep 2007