Vector Math Library (VML) Notes

Vector Math Library (VML) Notes
for Intel^®Architecture Processors

Disclaimer

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL^® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel may make changes to specifications and product descriptions at any time, without notice.

This document describes Vector Math Library (VML), which is designed to compute elementary functions on vector arguments. VML is an integral part of the Intel® Math Kernel Library and the VML terminology is used here for simplicity in discussing this group of functions.

VML includes a set of highly optimized implementations of certain computationally expensive core mathematical functions (power, trigonometric, exponential, hyperbolic, etc.) that operate on vectors. VML may significantly improve performance for such applications as nonlinear software, computations of integrals, and many others.

Each vector function from VML (for each data format) can work in two modes: High Accuracy (HA) and Low Accuracy (LA). For many functions, using the LA version improves performance at the cost of accuracy. For some cases, the advantage of relaxing the accuracy improves performance very little so the same function is employed for both versions. Error behavior depends not only on whether the HA or LA version is chosen, but also on the processor on which the software runs. For instance, the Streaming SIMD Extensions (SSE) have different arithmetic behavior than do X87 instructions, so results will differ between an Intel® Pentium® processor and an Intel® Pentium® 4 processor, the latter employing the SSE-2 instructions. In addition, special value behavior may differ between the HA and LA versions of the functions. For more information see the special value behavior and accuracy sections.

This document is intended for more detailed description of performance and accuracy properties of VML functions. There are several issues considered (performance, accuracy, special values processing) and two levels of details (brief information for all functions in one table and more detailed information for every function on a separate page).

Performance issues: Performance numbers in the respective tables are shown for so-called "working" intervals arguments. Performance behavior may be different for other intervals. For example, it is quite expensive to compute trigonometric functions on "huge" arguments. Therefore, to obtain needed accuracy, performance is sacrificed. Each function lists the working interval over which performance is measured. The same page contains graphs that demonstrate how the performance behavior depends on vector length. There are two extreme cases: so-called "short" and "long" vectors (logarithmic scale is used to show both cases). For short vectors there are cycle organization and initialization overheads. The cost of such overheads is amortized with increasing vector length, and for vectors longer than a few dozens of elements the performance remains quite flat until the L2 cache size is exceeded with the length of the vector.

Data prefetching with the Intel® Pentium® III processor (explicit data prefetch in software) and Pentium 4 processor (implicit data prefetch in hardware) greatly reduce the out-of-cache problem.

Accuracy issues: The design requirement for the HA functions is less than 1.0 ulp error with all special values being processed correctly. For the LA version some of these requirements are not so strict. For more details see the accuracy table with ulp errors for all functions, a particular page for each function, and the special value behavior table for all functions.

Special Values processing issues: Special Values are processed in accordance with C9X standard. For a full list of special values for every function see the corresponding table. This table also shows how implementation of a particular function meets Special Values requirements: "+" or "-" in the table. Note, that for HA it should always be "+".

List of VML Functions
Performance of All VML Functions
Measured Accuracy of All VML Functions
Special Values of All VML Functions

To ensure a correct display of this document, use the following recommended browser versions: Internet Explorer* 5.5 or higher (on Windows*), Netscape* 4.79 or Mozilla* 1.2.1 or higher (on Linux*).

Celeron, Dialogic, EM64T, i386, i486, iCOMP, Intel, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetStructure, Intel Xeon, Intel XScale, Itanium, MMX, MMX logo, Pentium, Pentium II Xeon, Pentium III Xeon, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.