In Grids We Trust
An object’s history and how it arose, in other words its provenance, grant it status. Translating this to computing will allow the information generated and managed within distributed networks to be proven and trusted. Laying the foundations for this translation is a team of European researchers. The importance of understanding the process by which a result was generated is fundamental to many real-life applications in science, engineering, medical domain, supply management, etc. Without such information, users cannot reproduce, analyze or validate processes or experiments. Provenance is therefore important to enable users, scientists and engineers to trace how a particular result came about. Networks of computers at distributed locations, also known as Grids, operate by dynamically creating services at opportunistic moments to satisfy the need of some user. As Steve Munroe, Exploitation Manager for the EU Provenance project at the University of Southampton explains, “These services may belong to different stakeholders operating under various different policies about information sharing. The results provided by such a composition of services must, however, be trusted by the user and yet, when the services disband, how are we to obtain the verification of the processes that contributed to the final result?” This is why the IST-funded EU Provenance project is working to provide the mechanisms that enable such results to be trusted. It allows processes that contribute to a given result to be inspected and checks to be made to ensure that the correct processes were used. In Grid applications the diverse actors and complex processes behind a result make provenance important. In addition, “many Grid applications (such as organ transplant management) must obey a variety of regulations imposed by different governing bodies,” says Munroe. “Provenance can again be used to determine that a given process has adhered to the necessary regulations, thus enabling the end user to place trust in the results received.” In response, the project has defined a set of user requirements (carried out by the project partner MTA SZTAKI) to generate the software requirements for the provenance architecture, a first public version of which is now available. The requirements cover both the logical and process architectures of provenance systems. The logical architecture defines the components of a system for the recording, maintaining, visualizing, reasoning and analysis of process documentation, whereas the process architecture discusses scalability and security. Fundamentally they are technology-independent, which makes them reusable so they can be applied to different technologies. The architecture developed by the project is generic, in the sense that a core set of functionality that any industrial strength provenance architecture should have has been identified and designed. Translating this to a real-world instance then involves implementing the logical architecture and extensive interactions with experts to integrate the system with target domain applications. The project team is currently implementing the logical architecture (currently being carried out by IBM, the University of Southampton, and the University of Wales, Cardiff). This will then be used as the basis upon which examples of applications in aerospace engineering and organ transplant management will be made provenance aware (each of these applications are being managed by a project partner, i.e. the German Aerospace Center DLR and the Universitat Politecnica de Catalunya, respectively). Demonstrators for both applications will be done this year and these will be used to evaluate the architecture. “We also are developing a methodology that will facilitate the development of provenance-aware systems in other domains,” adds Munroe. “Furthermore, we aim to develop preliminary standardization proposals for provenance systems to submit to the relevant standardization bodies.” Ultimately such research will allow information generated and managed within a Grid infrastructure to be proven and trusted. This means that the information’s history, including the processes that created and modified it, are documented in a way that can be inspected, validated and reasoned about by authorized users who need to ensure information controls have not been altered, abused or tampered with.