Cyberinfrastructure resources at academic research institutions have increased substantially since 2005, according to data from the National Science Foundation’s (NSF’s) FY 2011 Survey of Science and Engineering Research Facilities. In FY 2011, 59 percent of academic institutions reported bandwidth of at least 1 gigabit per second (Gbps), compared with 21 percent of such institutions in FY 2005 (table 1).[2] The percentage of academic institutions with network connections of 10 Gbps or greater increased from 2 percent to 25 percent during this period.
* = value > 0 but < 0.5 percent. na = not applicable; category was added to FY 2011 survey. Gbps = gigabits per second; Mbps = megabits per second. a Figures for 2012 are estimated by responding institutions. NOTES: Details may not add to 100 percent due to rounding. FY 2009, 2011, and 2012 totals includes bandwidth to commodity Internet (Internet1), Internet2 (high-performance hybrid optical packet network), and National LambdaRail (advanced optical network infrastructure for research and education). Data for FY 2005 and FY 2007 are limited to Internet1 and Internet2. The response categories in the FY 2005 survey varied slightly from those in the FY 2007–FY 2011 surveys; in the FY 2005 survey, the categories included “1 to 2.5 Gb” (gigabits) and “2.6 to 9 Gb.” SOURCE: National Science Foundation, National Center for Science and Engineering Statistics, Survey of Science and Engineering Research Facilities. Table 1 Source Data: Excel file Networking Access to High-Speed Bandwidth Academic institutions have continued to gain greater access to high-speed bandwidth through a network of national and regional providers. The backbone of the national network is managed primarily by not-for-profit consortia. These include Internet2, an organization established in 1997 that comprises research, academic, industry and government partners, and National LambdaRail, a university-owned organization established in 2003 that manages a 12,000 mile high-speed network. The Energy Sciences Network, a U.S. Department of Energy (DOE)–funded network supporting 40 DOE sites as well as researchers at universities and other research institutions also contributes to the national network. Greater access to high-speed bandwidth is also facilitated by regional networks and gigapops (gigabit points of presence), which provide connections to the national networks as well as to universities and laboratories. These regional networks also provide advanced network services to ensure reliable and efficient data transfer. Doctorate-granting institutions constituted 70 percent of academic institutions reporting at least $1 million in science and engineering research and development in FY 2011. These institutions were more likely than nondoctorate-granting institutions to have higher speed bandwidth capacity. In 2011, the percentage of doctorate-granting institutions with bandwidth of at least 2.5 Gbps (43 percent) was more than ten times greater than that of nondoctorate-granting institutions (4 percent) (table 2). This pattern continued in FY 2012, when 53 percent of doctorate-granting institutions estimated that they would have bandwidth of 2.5 Gbps or greater, compared with 5 percent of nondoctorate-granting institutions.
* = value > 0 but < 0.5 percent. Gbps = gigabits per second; Mbps = megabits per second. NOTES: Details may not add to 100 percent due to rounding. Total includes bandwidth to commodity Internet (Internet), Internet2 (high-performance hybrid optical packet network), and National LambdaRail (advanced optical network infrastructure for research and education). SOURCE: National Science Foundation, National Center for Science and Engineering Statistics, Survey of Science and Engineering Research Facilities, FY 2011. Table 2 Source Data: Excel file
Public doctorate-granting institutions were more likely than their private counterparts to have greater access to higher speed bandwidth. Thirty-eight percent of public doctorate-granting institutions had bandwidth access of 10 Gbps or greater in FY 2011, compared with 22 percent of their private counterparts. Forty-nine percent of these public institutions expected to have bandwidth access of 10 Gbps or greater by FY 2012, whereas 25 percent of private doctorate-granting institutions estimated possession of these resources. Dark Fiber Dark fiber is fiber-optic cable that has already been laid but is not yet being used. The practice of laying cable in anticipation of future use is a cost-saving approach in network planning. It is more economical to lay excess cable during construction than it is to install it later, when usage demands increase. This cable-in-waiting can be “lit” or engaged quickly. Thus the amount of dark fiber owned by institutions indicates the ability to expand existing network capabilities, either between existing campus buildings or from the campus to an external network. The percentage of academic institutions with these available cables has increased steadily with each biennial survey cycle. The percentage of institutions with dark fiber to an external network has grown from 29 percent in FY 2005 to 47 percent in FY 2011. The percentage of institutions with dark fiber between their own buildings remained high throughout this period, increasing slightly from 86 percent in FY 2005 to 90 percent in FY 2011 (table 3).
ISP = internet service provider or external network. NOTES: Percentages reflect academic institutions that owned dark fiber at the end of the fiscal year. Dark fiber is fiber optic cable that has already been laid but is not being used. Total yearly academic institutional participation as follows: FY 2005, 449; FY 2007, 448; FY 2009, 495; and FY 2011, 539. SOURCE: National Science Foundation, National Center for Science and Engineering Statistics, Survey of Science and Engineering Research Facilities. Table 3 Source Data: Excel file
High-Performance Computing Resource Management Many academic research institutions manage their high-performance computing (HPC) resources through a distinct organizational unit within the institution that has a separate staff and budget. A total of 192 of the 539 surveyed academic institutions reported ownership of centrally administered HPC resources of 1 teraflop or faster in FY 2011 (table 4).[3] This administrative approach enables faculty to focus on their primary responsibilities instead of being diverted by administration and fund-raising to support their own HPC resources. Central HPC administration can decrease overall operating expenses and create wider availability of computing resources.[4] However, many HPC resources are supported by external funding sources and managed by the researchers, as opposed to institutional administrators. These resources are difficult to track by administrators and are therefore not included in these data.
HPC = high-performance computing; MPP = massively parallel processors; SMP = symmetric multiprocessors. NOTES: Each institution is counted only once in each architecture. Only HPC systems with peak performance of 1 teraflop or faster are included. Centrally administered HPC is located within a distinct organizational unit with a staff and a budget; unit has stated mission that includes supporting HPC needs of faculty and researchers. Institutions may have HPC of more than one type of architecture. Accelerators may be system components or independent. SOURCE: National Science Foundation, National Center for Science and Engineering Statistics, Survey of Science and Engineering Research Facilities, FY 2011. Table 4 Source Data: Excel file
Forty-seven percent of doctorate-granting institutions provided HPC resources for their campuses, compared with less than 9 percent of nondoctorate-granting institutions (table 4). Similar percentages of public doctorate-granting (48 percent) and private doctorate-granting (45 percent) institutions provided these resources. Clusters are the most common centrally administered HPC architecture employed by academic institutions because they provide the most flexibility and cost efficiency for scaling, in addition to their generally lower administrative costs. Ninety-seven percent of HPC-providing institutions employ cluster architectures. HPC-providing institutions also use architectures such as massively parallel processors (11 percent of institutions), symmetric multiprocessors (19 percent), or “other” types of architectures (20 percent), all of which can be used in conjunction with or as an alternative to clusters.[5] In FY 2011, 24 institutions possessed centrally administered HPC resources with combined computing capacity of at least 100.0 teraflops (figure 1). Another 29 institutions had combined computing capacity of 40.0 to 99.9 teraflops. As recently as FY 2007, only four institutions reported centrally administered computing capacity of at least 40.0 teraflops, with only one institution surpassing the 100.0 teraflop threshold. The median total performance reported for centrally administered systems was 14 teraflops in FY 2011, compared with 8 teraflops in FY 2009 and 4 teraflops in FY 2007 (data not shown). Figure 1 Source Data: Excel file
Resource Sharing Colleges and universities often share their HPC resources with external organizations. In FY 2011, these partnerships most often involved other colleges or universities, as 72 percent of academic institutions shared resources with their peers. Sharing of HPC resources with other external users was less common but still notable for government (21 percent), industry (18 percent), and nonprofit organizational partners (17 percent). Public institutions were more likely to have external users of their HPC resources than were private institutions (data not shown). Data Storage As the collection of massive data sets has increased in recent years, data storage and curation have become increasingly critical issues. Data management plans are often required in grant proposals where large data sets will be used. Of the academic institutions with centrally administered HPC resources in FY 2011, 56 percent reported usable online storage greater than 100 terabytes (table 5).[6] A smaller share of public (21 percent) and private institutions (18 percent) provided greater than 500 terabytes of online storage.
NOTES: A total of 190 of the 192 institutions with centrally administered high-performance computing resources provided data for usable online storage. A total of 191 provided data on archival storage. SOURCE: National Science Foundation, National Center for Science and Engineering Statistics, Survey of Science and Engineering Research Facilities, FY 2011. Table 5 Source Data: Excel file
As of FY 2011, 45 percent of institutions with centrally administered HPC resources reported no archival storage. Archival storage includes online and off-line storage for files and data that are not immediately accessible from HPC resources. This figure changed little from FY 2009 (43 percent), yet it stands much higher than it did in FY 2007 (28 percent). Data Sources and Availability The data presented in this InfoBrief were obtained from NSF’s FY 2011 Survey of Science and Engineering Research Facilities. Data from the computing and networking capacity section of the survey were provided by 539 of 554 colleges and universities with at least $1 million in science and engineering research and development expenditures. Institutions were identified as meeting this threshold through the FY 2010 NSF Higher Education Research and Development Survey. The full set of detailed tables is available in the report Science and Engineering Research Facilities: Fiscal Year 2011 at http://www.nsf.gov/statistics/facilities/. Please contact the author for more information. This survey has been conducted biennially since 1986. The computing and networking capacity section was added in 2003. Notes [1] Michael T. Gibbons, Research and Development Statistics Program, National Center for Science and Engineering Statistics, National Science Foundation, 4201 Wilson Boulevard, Suite 965, Arlington, VA 22230 (mgibbons@nsf.gov; 703-292-4590). [2] The data transfer rate of a computer network is measured in bits per second. One gigabit per second (Gbps) equals one billion bits per second. [3] In FY 2011, 36 percent of institutions reported centrally administered HPC resources of 1 teraflop or faster. The corresponding totals in preceding cycles were 28 percent (136 of 494) in FY 2009 and 22 percent (99 of 449) in FY 2007. Floating point operations per second (flops) reflect the number of multiplications that a computer processor can perform within 1 second. A teraflop is a measure of computing speed equal to 1 trillion floating point operations per second. [4] These points have been cited as rationales for centralizing cyberinfrastructure and HPC resources at several institutions. See the following documents as examples: The University of Arizona, University Information Technology Services. Program Benefits to Researchers and Campus. Available at http://rc.arizona.edu/node/117. University of California, San Diego, UCSD Research Cyberinfrastructure Design Team. 2009. Blueprint for the Digital University: UCSD Research Cyberinfrastructure. Available at http://rci.ucsd.edu/_files/Blueprint.pdf. Bose R, Crosswell A, Hamilton V, Mesa N. 2010. Piloting sustainable HPC for research at Columbia. Position paper for the Workshop on Sustainable Funding and Business Models for Academic Cyberinfrastructure (CI) Facilities. Cornell University, Ithaca: NY. [5] Clusters use multiple commodity systems, each running its own operating system with a high-performance interconnect network to perform as a single system. Massively parallel processors use multiple processors within a single system with a specialized high-performance interconnect network. Each processor uses its own memory and operating system. Symmetric multiprocessors use multiple processors sharing the same memory and operating system to simultaneously work on individual pieces of a program. |
[6] Online storage includes all storage providing immediate access for files and data from HPC systems of at least 1 teraflop. Storage can be either locally available or made available via a network. Usable storage is the amount of space for data storage that is available for use after the space overhead required by file systems and applicable RAID (redundant array of independent disks) configurations is removed.