http://www.aero.org/publications/crosslink/winter2005/04.html
The NPOESS Preparatory Project: Architecture and Prototype Studies
Samuel Gasster, Sheri Benator, and David Bart
Aerospace helped NASA develop a system segment architecture for the
NPOESS Preparatory Project using C4ISR as well as an advanced ground
system prototype design using grid computing technology.
The National Polar-orbiting Operational Environmental Satellite System
(NPOESS) represents a convergence of systems previously operated by the
Department of Defense and the National Oceanic and Atmospheric
Administration (NOAA). Scheduled for launch in 2009, it will support a
broad range of activities in global environmental monitoring,
meteorology, and climatology.
=====This article describes architecture and prototype studies
performed by Aerospace in support of the NPP mission. During the course
of these studies, the NASA acquisition strategy for NPP was changed. As
a result, NASA did not directly apply the results of the architecture
and prototype studies to the acquisition of an NPP ground system;
however, the results of these studies provided both NASA and Aerospace
with valuable lessons learned on many aspects of ground system
architecture and design.
======Given the high data rates from the NPP sensors, the system is
expected to generate 2 petabytes (2 million gigabytes) of raw and
processed data products over the 5-year mission life. The Science Data
Segment must provide for the processing, distribution, storage, and
archiving of this data. All of these functions must be managed and
scheduled to allow seamless operation.
---------Advanced Data Grid
While working with Aerospace to develop the NPP architecture
documentation, NASA realized the Science Data Segment presented novel
challenges in terms of ground system design and implementation. NASA
asked Aerospace to suggest possible approaches for the Science Data
Segment in early 2001. Based on initial architecture definition and
requirements, Aerospace recommended an emerging technology known as
grid computing (see sidebar, Grid Computing: An Overview Â
http://www.aero.org/publications/crosslink/winter2005/04_sidebar1.html).
Because of the relative lack of maturity of this approach, Aerospace
also recommended the development of a prototype implementation that
would allow NASA to investigate key features as it moved to procure the
full operational Science Data Segment. This prototype implementation
was named the Advanced Data Grid.
The primary goal of the Advanced Data Grid project was to assess the
applicability, effectiveness, and scalability of advanced data
processing and data management technologies to the design and
implementation of future Earth-science data processing systems. A key
objective was to perform this assessment in the context of Science Data
Segment requirements and workflow using grid computing technologies. An
additional objective was to demonstrate the execution of a
scientifically meaningful climate application requiring the management
of massive data sets that would be representative of the type of
application that the NPP science team might develop. Part of the
overall task for Aerospace was to define this application.
An analysis of mission requirements determined the need to develop a
ground data processing, storage, and archival system capable of
handling data rates greater than 10 megabits per second, with possible
reprocessing requirements of 20 times the data rate. The system would
have to store and distribute petabytes of data to a geographically
distributed team over the 5year expected mission life.
Grid computing seemed like a natural choice for the Science Data
Segment prototype because it directly addresses the issues of
integrating distributed heterogeneous resources as well as the dynamic
scheduling of these resources, discovery and distribution of data for
scientific applications, and the general sharing of computing
resources. In addition, the system could be expanded as needed. Thus,
the project office would not need to implement full system capability
at the start, but could expand the system over the mission lifetime.
This would allow better performance over time and significant cost
savings.
Aerospace worked with NASA to define the Advanced Data Grid
implementation. To simulate and test a wide range of representative
workflows, researchers decided to not only set up sites at Goddard and
Aerospace, but to incorporate the existing grid-computing capabilities
developed at the Ames Research Center as part of the NASA Information
Power Grid. Thus, the physical implementation for the Advanced Data
Grid involved tying together resources at these three sites using the
NASA Research and Engineering Network. Goddard provided the primary
data processing and storage, Aerospace provided science user simulation
and test management, and the Information Power Grid provided additional
data processing and storage. The Advanced Data Grid team determined
that petabytes of data would not be needed for the workflow
simulations; rather, the testing could be performed using about 60
gigabytes of data and replaying the data where necessary.
 major function that the Advanced Data Grid needed to demonstrate was
the management of a large data set. The selected approach involved the
implementation of a metadata catalog, tools for searching the catalog,
interfaces to back-end storage systems, and tools for retrieving the
found data sets. The implementation would then allow any user to search
the metadata catalog using key words common to a specific problem
domain. The search would return logical pointers to the data sets that
match the user query. The pointers would then be passed to a replica
location service that would identify the physical data set and allows
data transfer to the user's designated site. Implicit in this process
are the authentication and authorization mechanisms required to
maintain information security.
Aerospace developed a plan and schedule that divided the Advanced Data
Grid project into four phases to be executed over approximately three
years. In the initialization phase, the basic hardware and software
would be acquired and installed at each site; the team would also
conduct internal testing and training, initiate acquisition of test
data sets, develop additional project documentation, and install
project configuration management tools. In the baseline phase, data
grid functionality and interoperability would be demonstrated across
all sites; the team would also establish benchmarks and start defining
and developing science applications. Next, in the grid testing phase,
the team would conduct major data grid testing, performance analysis,
and assessment. Finally, the application demonstration phase would
perform an appropriate climate data application. The Distributed Active
Archive Center at Goddard would serve as the primary source of test
data (the team planned to use MODIS sensor data). The team was also
exploring the possibility of developing a grid capability and a future
grid interface between the Distributed Active Archive Center and the
Advanced Data Grid.
Aerospace and Goddard were well into the initialization phase by summer
of 2003, having implemented initial data processing and storage
capabilities at Goddard, purchased computing hardware for the Aerospace
site, and negotiated network access and Information Power Grid resource
use. Then, faced with programmatic and budgetary constraints, NASA
headquarters revised the acquisition plan for the Science Data Segment,
and the Advanced Data Grid Project was cancelled.
Nonetheless, the Aerospace and NASA team members learned important
lessons about ground-system design and implementation using grid
computing technologies. Their experience suggests that grid-based
ground systems architectures have considerable potential for a wide
range of Aerospace customers because of their ability to support a wide
variety of problem domains, provide cross-program interoperability, and
enable distributed work flow. Grid-based architectures also have
significant potential for cost savings over the life of a program by
allowing the purchase of commodity computing hardware. This allows a
program to keep pace with the rapid change of technology for reasonable
cost.
One of the key benefits of grid computing is the creation of "virtual
organizations" by enabling the sharing of computing resources across
traditional organizational and administrative boundaries. This requires
strong teamwork, the involvement of all the stakeholders, and careful
negotiation of policy issues (e.g., security). On the other hand, the
chosen approach to metadata catalogs did not allow for their
federation, so metadata catalog services were a potential single point
of failure; this issue will need to be addressed. Grid service
standards are changing, and their support by vendors will need to be
considered. Since the termination of the Advanced Data Grid project,
grid and web service standards have somewhat merged with the goal of
providing better implementation and stability.
-----------------------------------
Sonya PLoS Medicine
The open-access general medical journal from the Public Library of
Science
Inaugural issue: Autumn 2004Â Â Share your discoveries with the world.
http://www.plosmedicine.org