What is Diamond?
Diamond
home
Diamond is a project
that began in 2002 as a collaboration between Intel and Carnegie Mellon University. Over time, the research has evolved to span many other institutions, including the University of Pittsburgh, UPMC, Merck Research Labs,
IBM Research, Georgia Institute of Technology, Rice University, and
Duke University. For an account of the early phases
of this evolution and the emergence of key Diamond concepts and
mechanisms see the paper "Searching Complex Data Without an Index".
Diamond's goal
is to enable interactive search of Internet data repositories that
store vast amounts of complex, unindexed data such as digital photographs, video
streams, and medical images. This research is
centered on a storage architecture and open-source implementation called
the
OpenDiamond® platform. At the heart of this platform is the concept of early
discard, or the ability to reject irrelevant data items very
close to their point of storage. Since the knowledge needed to
recognize irrelevant data is domain-specific, early discard requires
application code called a searchlet
to be executed close to storage. The OpenDiamond
platform also embodies the
concepts of result caching and self-tuning. This allows it to leverage work done in previous searches, and to dynamically adapt
to
different hardware configurations, workloads, and data content in a
manner that is completely transparent to users and applications. A mechanism called scoping
enables Diamond searches to span structured data sources (such as a
relational database) as well as unstructured data (such as
images). Modules called data retrievers enable searches over a wide range of data sources, including live data from webcams and dynamic content sources such as GigaPan. Layered on top of the OpenDiamond platform are a number of open-source applications that are customized for specific domains and data types.
More broadly,
Diamond's goal is to help domain experts discover something relevant to
a task in a large
distributed repository of complex, unindexed and loosely-structured
data. Suppose, for example, a pharmaceutical researcher
wishes to identify adverse effects
of a drug in a large collection of automated cell microscopy images.
The term "adverse effects" refers to a vague concept. A more precise
definition can only be given after examining the data in some
depth. In other words, hypothesis-formation and hypothesis-validation
proceed hand-in-hand in a tightly-coupled and iterative sequence. We
refer to this inherently human-centric activity as interactive
data
exploration. To the best of our knowledge, Diamond was the first system (and currently the only system) to provide this capability.
OpenDiamond is a registered trademark of
Carnegie Mellon University