What is Diamond?

Diamond home

Diamond is a project that began in 2002 as a collaboration between Intel Research Pittsburgh and Carnegie Mellon University.  The collaboration has since expanded to include the University of Pittsburgh, UPMC, and Merck Research Labs.    The collaboration uses a unique Open Collaborative Research model that encourages research publication and release of open source software rather than proprietary development.   We welcome interest in Diamond collaboration by other companies and universities.   For companies that would like to benefit from Diamond but do not wish to participate in open collaborative research, Carnegie Mellon University is in the early stages of creating a Diamond consortium.

Diamond's goal is to enable interactive search of Internet data repositories that store vast amounts of complex, non-indexed data such as digital photographs, video streams, and medical images.  The Diamond architecture can be mapped to a variety of storage back-ends such as SANs, blade servers on LANs, Internet servers, and active disks.  This research is centered on a storage architecture and open-source prototype implementation called the OpenDiamond® platform for interactive search.  At the heart of the OpenDiamond platform is the concept of early discard, or the ability to reject irrelevant data items very close to their point of storage.  Since the knowledge needed to recognize irrelevant data is domain-specific, early discard requires application code called a searchlet to be executed close to storage.   This makes brute-force interactive search practical by eliminating irrelevant data as cheaply as possible. Further, the OpenDiamond platform embodies the concept of self-tuning. This allows it dynamically adapt to different hardware configurations, workloads, and data content in a manner that is completely transparent to users and applications. 

More broadly, Diamond's goal is to help domain experts discover something relevant to a task in a large distributed repository of complex, non-indexed and loosely-structured data.   Suppose, for example,  a pharmaceutical researcher wishes to identify adverse effects of a drug in a large collection of automated cell microscopy images. The term "adverse effects" refers to a vague concept. A more precise definition can only be given after examining the data in some depth. In other words, hypothesis-formation and hypothesis-validation proceed hand-in-hand in a tightly-coupled and iterative sequence. We refer to this inherently human-centric activity as interactive data exploration.   Medical and pharmaceutical researchers at UPMC, University of Pittsburgh School of Medicine, and Merck are collaborating with Diamond researchers to apply Diamond to their domain-specific tasks. This may open the door to research and diagnostic strategies that were not considered feasible until now.
OpenDiamond is a registered trademark of Carnegie Mellon University