Bringing the Next-generation Web to Science
Scientific community pioneers evolution of Web 2.0
Experiments and their produced data occur in direct response to the questions and hypotheses that a scientific community puts forth. Data is always connected to people,
![]() click to enlarge Figure 1: The Flickr photo-sharing Web site is a popular Web 2.0 site. |
interpretations and hypotheses; and the interplay between them, though sometimes unpredictable, is a formal process requiring consistency and logic. The connections between data, interpretations and hypotheses are captured in the form of annotations that are used by the scientific community for collaboration. Annotations have long been part of the discovery and development process; they are important because they allow scientists to identify and focus on what’s most important within a sea of data. They are records of the observations and interpretations made by another scientist.
Annotations serve many additional roles, such as educating new project team members, reporting progress or meeting regulatory requirements. Project reports are a collection of annotations that summarize findings and evidence from the lab. It is not uncommon for a new scientist on a project to be handed a large stack of papers marked up with sticky notes and highlighter as a means for communication. Finally, data submitted to regulatory agencies is highly annotated with forms, reports and other accompanying documentation to prove compliance or seek approval.
Today’s approach to sharing annotations within the scientific community includes paper, e-mail, presentations and reports, but this creates many challenges. The most significant challenges of annotations in their current form is that they are disconnected from the data, difficult to organize, and impractical to find. Scientists cannot easily identify research or people within their community other than through literature review and traditional peer-to-peer conversations. Consequently, scientific communities are bottlenecked by the limited amount of time there is to read papers, attend conferences, meet face-to-face, or attend project meetings. This is where next-generation technologies such as Semantic Web and Web 2.0 can enable more dynamic forms of collaboration within the scientific community.
The next-generation Web
The incarnation of the Web had been a response to this need for annotating and sharing information amongst a distributed community of scientists. Originally, annotations were simply hypertext links and text that could be organized on a page and published. This was a breakthrough as it allowed distributed sets of information to be connected and published in a truly open, free-form manner.
However, as content grew, users needed a way to annotate the Web itself. Since most people could not create their own HTML, browsers started to support link managers called “bookmarks,” or “favorites,” that would record and organize pages of interest. But the content continued to grow until it became apparent that it was easier just to use a search engine to find information, rather than manage the bookmarks manually; in effect, search engines became dynamic bookmarking services. And, while the first generation of the Web has been enhanced by search engines that are improving the relevance of their results, another development has been cooking up in the background — Web 2.0.
Web 2.0 integrates users through an “architecture of participation,” whereby a community of users creates a self-regulating collaborative network to find and utilize information and services. Web 2.0 sites, also called social networking sites, provide information and services not randomly discovered and crawled by robots and spiders, but identified, registered, tagged and rated by users for their own benefit.
Figure 1 shows a popular Web 2.0 site called Flickr, which facilitates photo sharing either privately or publicly. Beyond the ability to share and organize photos, Flickr provides an architecture of participation that enables tagging and markup of photos, searching based on users and tags, comments and comment trails, linking photos together, and notifications. All of these features are common collaborative capabilities of a social networking site. Social tagging, rating and commentary is pervading all parts of the Web. Examples of Web 2.0 include media sharing sites such as Flickr and YouTube, and e-Commerce sites such as Amazon and Barnes & Noble.
As the growth of scientific data continues both inside and outside the firewall, search engines have not solved the problem of helping scientists locate relevant information for their projects. In fact, it is important to bear in mind that scientific data without a social context is meaningless, since scientists collectively propose and test hypotheses.
![]() click to enlarge Figure 2: An annotation in action on the PubMed database of life sciences abstracts. |
Scientific data must have this context in order to support the growth of discovery and product development. The Web 2.0’s architecture of participation may well eliminate many of the technical bottlenecks in today’s scientific collaboration.
There is a whole class of Web 2.0 sites that are considered social bookmarking sites, pointing to potential new collaborative possibilities within the sciences. These sites include del.icio.us, Spurl, Furl, and CiteULike. One social bookmarking site going the farthest for the sciences is Connotea, which has specific features for extracting and linking references from scientific papers.
From social bookmarking to social annotation
Before Web 2.0 can become an integral part of the scientific process, the social bookmarking sites need to evolve. This evolution requires the incorporation of scientific semantics and a richer set of data other than just pages on the Web. Work has begun, and the life sciences community is an early pioneer in this area.
One example is the Community Annotation Project (CAP), a collaborative effort that engages experts in the biology of Neurospora crassa to improve the quality of the N. crassa genome annotation. CAP is designed to provide the fungal research community the ability to associate information with genetic features onto the Neurospora genome, to refine gene structures, and to curate all entries in a structured searchable database.
CBioC is another life sciences social annotation site, which allows extraction and collaboration of biomedical interactions. Whenever a scientist visits PubMed, a database of life sciences abstracts, CBioC allows a user to semantically tag the information about interactions and to make them available to others. CBioC also links in and shows data from BIND, DIP, MINT, GRID and IntAct, which are other life sciences databases on the Web.
Figure 2 shows CBioC being used by a scientist. In this screenshot, an article abstract from PubMed appears in the upper portion of the Web browser, and the semantically rich annotation being made by a user in the bottom portion of the browser. The annotation tool shows annotations made by others and allows the scientist to make his or her own annotations. Related articles are available by clicking the link next to any of the specific annotations. Finally, search is available right in the context of annotation.
In both of these examples, the information being annotated was a rich set of data and specific semantics regarding meaning of the data. To simplify and standardize this type of tagging, the World Wide Web Consortium (W3C) has developed a set of standards to bring semantics to a Web 2.0 for scientists. These standards include Resource Description Framework (RDF) that provides a semantically rich tagging structure, and Web Ontology Language (OWL), which provides a means for publishing semantic definitions. One of the first forays to using these standards in science is the Healthcare and Life Sciences Interest Group, which is a part of the W3C.
Science Drivers for Web 2.0
There are many potential drivers to the adoption of Web 2.0 style collaboration within the sciences. However, partnering has the potential for creating the most demand. More and more discovery and development is a collaborative process that spans multiple companies and includes academia and government as well. In the Burrill’s Biotech 2006, an industry-record $17 billion was generated through partnering. The biopharma industry as an example is moving into a distributed value chain of scientific collaboration. The efficiency of this value chain cannot be sustained by face-to-face meetings and office documents. The distributed value chain, more efficient interaction with regulatory agencies, and geographically dispersed teams will require a new means of collaboration. Web 2.0 just might help.
Matt Shanahan is the Chief Marketing Officer at Teranode. He may be reached at [email protected].
Social Networking Resources | |
Media Sharing Sites | |
Flickr | www.flickr.com |
YouTube | www.youtube.com |
e-Commerce Sites | |
Amazon | www.amazon.com |
Barnes & Noble | www.barnesandnoble.com |
Social Bookmarking Sites | |
CiteULike | www.citeulike.org |
Connotea | www.connotea.org |
del.icio.us | www.del.icio.us |
Furl | www.furl.net |
Spurl | www.spurl.net |
Life Sciences Social Annotation Sites | |
Broad Institute: N.crassa Community Annotation | www.broad.mit.edu/annotation/genome/ neurospora/cahome.html |
Collaborative Bio Curatioin (CBioC) | www.cbioc.eas.asu.edu/index.php |
Use of Semantic Standards | |
W3C Healthcare and Life Sciences Interest Group | www.w3.org/2001/sw/hcls |