What is Diamond?
Diamond
home
Diamond is a project
that began in 2002 as a collaboration between Intel Research Pittsburgh and Carnegie Mellon University. The
collaboration has since expanded to include the University of Pittsburgh, UPMC, and Merck Research Labs.
The collaboration uses a unique Open Collaborative
Research model
that encourages research publication and release of open source
software rather than proprietary development. We welcome
interest in Diamond collaboration by other companies and
universities. For companies that would like to benefit from
Diamond but do not wish to participate in open collaborative research,
Carnegie Mellon University is in the early stages of creating a Diamond
consortium.
Diamond's goal
is to enable interactive search of Internet data repositories that
store vast amounts of complex, non-indexed data such as digital photographs, video
streams, and medical images. The Diamond
architecture can be mapped to a variety of storage back-ends such as SANs, blade
servers on LANs, Internet servers, and active disks. This research is
centered on a storage architecture and open-source prototype implementation called
the
OpenDiamond® platform
for interactive search. At the heart of the OpenDiamond
platform is the concept of early
discard, or the ability to reject irrelevant data items very
close to their point of storage. Since the knowledge needed to
recognize irrelevant data is domain-specific, early discard requires
application code called a searchlet
to be executed close to storage. This
makes brute-force interactive search practical by eliminating
irrelevant data as cheaply as possible. Further, the OpenDiamond
platform embodies the
concept of self-tuning. This allows it dynamically adapt
to
different hardware configurations, workloads, and data content in a
manner that is completely transparent to users and applications.
More broadly,
Diamond's goal is to help domain experts discover something relevant to
a task in a large
distributed repository of complex, non-indexed and loosely-structured
data. Suppose, for example, a pharmaceutical researcher
wishes to identify adverse effects
of a drug in a large collection of automated cell microscopy images.
The term "adverse effects" refers to a vague concept. A more precise
definition can only be given after examining the data in some
depth. In other words, hypothesis-formation and hypothesis-validation
proceed hand-in-hand in a tightly-coupled and iterative sequence. We
refer to this inherently human-centric activity as interactive
data
exploration.
Medical and pharmaceutical researchers at UPMC, University of
Pittsburgh School of Medicine,
and Merck are collaborating with Diamond researchers to apply Diamond
to their domain-specific tasks. This may open the door to research
and diagnostic strategies that were not considered feasible until now.
OpenDiamond is a registered trademark of
Carnegie Mellon University