The Globus Alliance
businessWeb Page: http://dev.globus.org/wiki/Google_Summer_of_Code_2010_Ideas
Mailing List: There is no single main mailing list. See http://dev.globus.org/wiki/Mailing_Lists for a list of mailing lists. "gt-user" is, however, the mailing list that tends to attract most general queries (https://lists.globus.org/mailman/listinfo/gt-user)
The Globus Alliance is a community of organizations and individuals developing fundamental technologies behind the "Grid," which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy. Since its creation in 1996, the Globus Alliance has been committed to developing open source software, although development was initially carried out by a small number of university research groups. However, since transitioning in 2005 to an open governance model (http://dev.globus.org/), derived from Apache's Jakarta project, the scope of participants has widened to include many more groups around the world, including companies and individuals. Globus currently hosts more than 20 projects, actively developed by a community of more than 100 committers, and spanning a variety of technology concerns on grid systems: common runtime, data management, information management, security, and documentation. Additionally, members of the Globus community can propose new projects which, after an "incubation" process (http://dev.globus.org/wiki/Incubator/Incubator_Process) can escalate to full Globus projects. There are currently 25 active projects in incubation.
Actual source code produced by the student participants in Google Summer of CodeTM 2010 for The Globus Alliance can be found at this code repository: http://code.google.com/p/google-summer-of-code-2010-globus/
Projects
- A 'GoogleMap' for scientific services and workflows Web services are now being widely used in practice. For example many scientific functions are offered by web services. However, how to use these web services conveniently is a problem, especially among those with limited IT expertise. This project intend to build a "google map" for scientific services and workflows to make them more useable.
- A RESTful service for orchestration of third party transfers and credential delegation using OAuth The goal of this project is to develop a RESTful service which orchestrates third party transfers and handles credential delegation using OAuth. This will provide support for credential delegation in a manner that is easy via web and web services using OAuth.. The service will be developed in an extensible manner which will allow to use any transfer service, and will provide a concrete example with GridFTP as the transfer service.
- A Spot Instances Approach for Scientific Clouds In many environments, cloud resources spend much time unoccupied due to a low demand on the cloud infrastructure. One way to maximize the utilization of the cloud infrastructure is to allow clients to bid on unused cloud capacity, and thus obtaining lower allocation costs. This project's goal is to extend the Nimbus Workspace Service to allow clients to bid and allocate cloud resources in an Amazon EC2 Spot Instance's fashion, targeted for science needs.
- Application to "Annotation Generation and Analysis Supporting Services-Oriented Scientific Workflow Discovery and Composition" Web services allow scientists to reuse existing applications to save time in creating scientific workflows. The process of selecting a web service to use can be a time consuming and difficult one. There is room for improvement by providing web services with annotations, or information about the service and its performance. Annotations could be added and analyzed by an automated program, which could also make recommendations for scientists on which services to use for their scientific workflows.
- Making the Swift Parallel Scripting System Easy to Install, Evaluate and Learn on Readily Available Computing Resources Swift is an extremely useful scripting system for leveraging the power of parallel computing. Recently, I have had the experience of going through the installation process, setup, and wading through user documentation as a new user. I believe that the entry path for new users has much room for improvement. Below I will describe my vision for a project aimed at smoothing the new user transition, a list of specific deliverables, and why I believe I am the right person to work on this project.
- Nimbus and Hadoop Distributed File System (HDFS) Integration This project aims to complete the proposal put forward by the Nimbus project for the use of HDFS as back end storage for VMs. Two major aspects of this project will be the development of an adapter to the workspace-control allowing HDFS locations to be used when staging VMs, as well as the development of a security scheme to control user access to the file system.
- Profiling the new GridFTP server The goal of this project is to do a full system profile of Globus GridFTP server including CPU and memory profiling. From the system profiling information, I will learn to understand the tradeoff between single client performance and server load and the impact of several parameters on data transfer rate and server CPU utilization. I will identify such configurations that improve the client performance without increasing server load and enable users to configure and optimize their systems.
- Storage system support for Swift Swift is a parallel scripting system for computing at petascale level. Building a system at this scale is not an easy task. The particular case of data management is handled using collective data management and data-aware dispatching. However, there is still space for improvement. We propose to use a shared file system deployed across all storage nodes and enable the application to pass hints about its data usage patterns. These hints are used by the storage layer to optimize its operations.
- Unix Domain Sockets for XIO as transport driver. To implement Unix domain sockets as an XIO transport driver, which will be used to coordinate the submission of jobs via a single job manager process in GRAM5. It will also include data descriptor code to send access rights to file descriptors between processes.
- Visualizing and summarising high volume, wide-area data movement with statistical methods and association rules. A graphical application for processing data and generating visualizations and summary statistics from data transfer records. The application would be used to create and edit report schemas that define what statistics and visualizations are generated from the data. The aim is not to replicate data mining and visualization software implemented elsewhere, but to leverage them through plugins and to produce a simple UI for visualizing data mining results and generating reports.
- XIO Reliable Multicast application The project aims to implement a type of multicast similar to Reliable Blast UDP. The idea is to achieve as high reliability as TCP, with less overhead. Another part is to test reliability and performance of the solution. The main deliverable will be the source code for the driver. Other deliverables are a performance/reliability study and a user guide. The reliability tests will simulate lost packages. The performance study will compare the driver to round robin file transfers and udpcast.