The TOP500 project was started in 1993 to provide a reliable basis for tracking and detecting trends in high-performance computing. Twice a year, a list of the sites operating the 500 most powerful computer systems is assembled and released. The best performance on the Linpack benchmark is used as a performance measure for ranking the computer systems. The list contains a variety of information including the system specifications and its major application areas.
Q: What is the TOP500?
A: The Top500 list the 500 fastest computer system being used today. In 1993, the collection was started and has been updated every six months since then. The report lists the sites that have the 500 most powerful computer systems installed. The best Linpack benchmark performance achieved is used as a performance measure in ranking the computers. The TOP500 list has been updated twice a year since June 1993.
A: The Linpack Benchmark is a measure of a computer’s floating-point rate of execution. It is determined by running a computer program that solves a dense system of linear equations. Over the years, the characteristics of the benchmark have changed a bit. In fact, there are three benchmarks included in the Linpack Benchmark report.
The Linpack Benchmark is something that grew out of the Linpack software project. It was originally intended to give users of the package a feeling for how long it would take to solve certain matrix problems. The benchmark stated as an appendix to the Linpack Users’ Guide and has grown since the Linpack User’s Guide was published in 1979.
A: The Linpack Benchmark report is entitled “Performance of Various Computers Using Standard Linear Equations Software.” The report lists the performance in Mflop/s of a number of computer systems. A copy of the report is available at http://www.netlib.org/benchmark/performance.ps.
A: The Linpack Benchmark report should be referenced in the following way: “Performance of Various Computers Using Standard Linear Equations Software”, Jack Dongarra, University of Tennessee, Knoxville TN, 37996, Computer Science Technical Report Number CS – 89 – 85, today’s date, url:http://www.netlib.org/benchmark/performance.ps
A: The paper “The LINPACK Benchmark: Past, Present, and Future” by Jack Dongarra, Piotr Luszczek, and Antoine Petitet provides a look at the details of the benchmark and provides performance data in graphics form for a number of machines on basic operations. A copy of the paper is available at http://www.netlib.org/utk/people/JackDongarra/PAPERS/hpl.pdf
A: Mflop/s is a rate of execution, millions of floating point operations per second. Whenever this term is used, it will refer to 64-bit floating point operations, and the operations will be either addition or multiplication. Gflop/s refers to billions of floating point operations per second and Tflop/s refers to trillions of floating point operations per second.
A: The theoretical peak is based not on an actual performance from a benchmark run, but on a paper computation to determine the theoretical peak rate of execution of floating point operations for the machine. This is the number manufacturers often cite; it represents an upper bound on performance. That is, the manufacturer guarantees that programs will not exceed this rate-sort of a “speed of light” for a given computer. The theoretical peak performance is determined by counting the number of floating-point additions and multiplications (in full precision) that can be completed during a period of time, usually the cycle time of the machine. For example, an Intel Itanium 2 at 1.5 GHz can complete 4 floating point operations per cycle or a theoretical peak performance of 6 GFlop/s.
A: The three benchmarks in the Linpack Benchmark report are for Linpack Fortran n = 100 benchmark (see Table 1 for the report), Linpack n = 1000 benchmark (see Table 1 of the report), and Linpack’s Highly Parallel Computing benchmark (see Table 3 of the report).
A: The first benchmark is for a matrix of order 100 using the Linpack software in Fortran. The results can be found in Table 1 of the benchmark report. In order to run this benchmark, download the file from http://www.netlib.org/benchmark/Linpackd, this is a Fortran,program. In order to run the program, you will need to supply a timing function called SECOND, which should report the CPU time that has elapsed. The ground rules for running this benchmark are that you can make no changes to the Fortran code, not even to the comments. Only compiler optimization can be used to enhance performance.
A: The Linpack benchmark measures the performance of two routines from the Linpack collection of software. These routines are DGEFA and DGESL (these are double-precision versions; SGEFA and SGESL are their single-precision counterparts). DGEFA performs the LU decomposition with partial pivoting, and DGESL uses that decomposition to solve the given system of linear equations.
Most of the time is spent in DGEFA. Once the matrix has been decomposed, DGESL is used to find the solution; this process requires O(n2) floating-point operations, as opposed to the O(n3) floating-point operations of DGEFA. The results for this benchmark can be found in Table 1 second column under “LINPACK Benchmark n = 100” of the Linpack Benchmark Report.
A: The second benchmark is for a matrix of size 1000 and can be found in Table 1 of the benchmark report. In order to run this benchmark download the file from http://www.netlib.org/benchmark/1000d, this is a Fortran driver. The ground rules for running this benchmark are a bit more relaxed in that you can specify any linear equation solve you wish, implemented in any language. A requirement is that your method must compute a solution and the solution must return a result to the prescribed accuracy. TPP stands for Toward Peak Performance; this is the title of the column in the benchmark report that lists the results.
A: The performance of a computer is a complicated issue, a function of many interrelated quantities. These quantities include the application, the algorithm, the size of the problem, the high-level language, the implementation, the human level of effort used to optimize the program, the compiler’s ability to optimize, the age of the compiler, the operating system, the architecture of the computer, and the hardware characteristics. The results presented for these benchmark suites should not be extolled as measures of total system performance (unless enough analysis has been performed to indicate a reliable correlation of the benchmarks to the workload of interest) but, rather, as reference points for further evaluations.
A: There are many reasons why your results may vary from results recorded in the Linpack Benchmark Report. Issues such as load on the system, accuracy of the clock, compiler options, version of the compiler, size of cache, bandwidth from memory, amount of memory, etc can affect the performance even when the processors are the same.
A: The third benchmark is called the Highly Parallel Computing Benchmark and can be found in Table 3 of the Benchmark Report. (This is the benchmark used for the Top500 report). This benchmark attempts to measure the best performance of a machine in solving a system of equations. The problem size and software can be chosen to produce the best performance. http://www.netlib.org/benchmark/hpl/
A: The “ground rules” for running the first benchmark in the report, n=100 case, are that the program is run as is with no changes to the source code, not even changes to the comments are allowed. The compiler through compiler switches can perform optimization at compile time. The user must supply a timing function called SECOND. SECOND returns the running CPU time for the process. The matrix generated by the benchmark program must be used to run this case.
A: The “ground rules” for running the second benchmark in the report, n=1000 case, allow for a complete user replacement of the LU factorization and solver steps. The calling sequence should be the same as the original routines. The problem size should be of order 1000. The accuracy of the solution must satisfy the following bound:
(On IEEE machines this is 2-53 ) and n is the size of the problem. The matrix used must be the same matrix used in the driver program available from netlib.
A: The “ground rules” for running the third benchmark in the report, Highly Parallel case, allows for a complete user replacement of the LU factorization and solver steps. The accuracy of the solution must satisfy the following bound:
(On IEEE machines this is 2-53 ) and n is the size of the problem. The matrix used must be the same matrix used in the driver program available from netlib. There is no restriction on the problem size.
A: The solution to all three benchmarks must satisfy the following mathematical formula:
(On IEEE machines this is 2-53 ) and n is the size of the problem. This implies the computation must be done in 64 bit floating point arithmetic.
A: In order to have an entry included in the Linpack Benchmark report, the results must be computed using full precision. By full precision, we generally mean 64 -floating point arithmetic or higher. Note that this is not an issue of single or double precision, as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.
A: You can get a more personalized listing of machines by using the interface at http://performance.netlib.org/performance/html/PDSbrowse.html
This list is not kept current, however, and may lag the Linpack benchmark report by months.
A: You can download the programs used to generate the Linpack benchmark results by using the URL is http://www.netlib.org/benchmark/linpackd. This is a Fortran program. There is a C version of the benchmark located at: http://www.netlib.org/benchmark/linpackc. There is a Java version of the benchmark that can be downloaded as an applet. There is a Java program at:
http://www.netlib.org/benchmark/linpackjava/
A: For the 100×100 based Fortran version, you need to supply a timing function called SECOND. SECOND is an elapse timer function that will be called from Fortran and is expected to return the running CPU time in seconds. In the program two called to SECOND are made and the difference taken to gather the time.
A: The performance of the Linpack benchmark is typical for applications where the basic operation is based on vector primitives, such as adding a scalar multiple of a vector to another vector. Many applications exhibit the same performance as the Linpack Benchmark. However, results should not be taken too seriously. In order to measure the performance of any computer, it’s critical to probe for the performance of your applications. The Linpack Benchmark can only give one point of reference. In addition, in multiprogramming environments, it is often difficult to reliably measure the execution time of a single program. We trust that anyone actually evaluating machines and operating systems will gather more reliable and more representative data.
A: While we make every attempt to verify the results obtained from users and vendors, errors are bound to exist and should be brought to our attention. We encourage users to obtain the programs and run the routines on their machines, reporting any discrepancies with the numbers listed here.
A: The Linpack package is a collection of Fortran subroutines for solving various systems of linear equations. (http://www.netlib.org/Linpack/) The software in Linpack is based on a decompositional approach to numerical linear algebra. The general idea is the following. Given a problem involving a matrix, one factors or decomposes the matrix into a product of simple, well-structured matrices that can be easily manipulated to solve the original problem. The package has the capability of handling many different matrix types and different data types, and provides a range of options. Linpack itself is built on another package called the BLAS. Linpack was designed in the late 70s and has been superseded by a package called LAPACK.
A: The Linpack software library is available from netlib. See http://www.netlib.org/Linpack/
A: The BLAS (Basic Linear Algebra Subprograms) are high quality “building block” routines for performing basic vector and matrix operations. Level 1 BLAS do vector-vector operations, Level 2 BLAS do matrix-vector operations, and Level 3 BLAS do matrix-matrix operations. Because the BLAS are efficient, portable and widely available, they’re commonly used in the development of high quality linear algebra software, LINPACK and LAPACK for example. For additional information see: http://www.netlib.org/blas/
A: The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance for the BLAS routines. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK. For additional information see: http://www.netlib.org/atlas/
A: Linpack is not the most efficient software for solving matrix problems. This is mainly due to the way the algorithm and resulting software accesses memory. The memory access patterns of the algorithm have disregard for the multi-layered memory hierarchies of RISC architecture and vector computers, thereby spending too much time moving data instead of doing useful floating-point operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication in the innermost loops. For each computer architecture, block operations can be optimized to account for memory hierarchies, providing a transportable way to achieve high efficiency on diverse modern machines. We use the term “Transportable” instead of “portable” because, for fastest possible performance, LAPACK requires that highly optimized block matrix operations be already implemented on each machine. These operations are performed by the Level 3 BLAS in most cases.
A: LAPACK is a software collection to solve various matrix problem in linear algebra. In particular, systems of linear equations, least squares problems, eigenvalue problems, and singular value decomposition. The software is based on the use of block partitioned matrix techniques that aid in achieving high performance on RISC based systems, vector computers and shared memory parallel processors.
A: LAPACK can be obtained from netlib, see (http://www.netlib.org/lapack/)
The Linpack Benchmark is, in some sense, an accident. It was originally designed to assist users of the Linpack package by providing information on execution times required to solve a system of linear equations. The first “Linpack Benchmark” report appeared as an appendix in the Linpack Users’ Guide in 1979. The appendix comprised data for one commonly used path in Linpack for a matrix problem of size 100, on a collection of widely used computers (23 in all), so users could estimate the time required to solve their matrix problem.
Over the years other data was added, more as a hobby than anything else, and today the collection includes hundreds of different computer systems.
A: You can contact Jack Dongarra and send him the output from the benchmark program. When sending results, please include the specific information on the computer on which the test was run, the compiler, the optimization that was used, and the site it was run on. You can contact Dongarra by sending email to [email protected].
A: In order to run the benchmark program, you will have to supply a function to gather the execution time on your computer. The execution time is requested by a call to the Fortran function SECOND. It is expected that the routine returns the accumulated execution time of your program. Two calls to SECOND are made and the difference taken to compute the execution time.
A: The Performance API (PAPI) project specifies a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count Events, occurrences of specific signals related to the processor’s function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture.
For addition information see: http://icl.cs.utk.edu/projects/papi/
A: The results reported in the benchmark report reflect performance for 64-bit floating point arithmetic. On some machines, this may be DOUBLE PERCISION, such as computers that have IEEE floating point arithmetic, and on other computers this may be single precision, (declared REAL in Fortran), such as Cray’s vector computers.
- For more frequently asked questions about the TOP50 project and list, visit http://www.top500.org/resources/frequently-asked-questions/