​​dCache: scaling out to new heights

The dCache data storage system has been the dependable workhorse of high-energy physics experiments worldwide for the last 15 years. At Fermilab it was first adopted by the CDF Tevatron experiment and then became a backbone of regional CMS Tier-1 data center storage. The traces of the Higgs boson particle in digitized form had been hidden in the haystack of more than hundreds of petabytes of data delivered using dCache. These were reconstructed by analysis software to reveal a long-sought missing piece of the Standard Model. An efficient data storage system underpins high data throughput scientific research. The dCache data storage system plays a very important role in helping to deliver this and many other major scientific results.

The public dCache instance at Fermilab has served the needs of a diverse community of customers for many years including neutrino experiments, astrophysics, lattice QCD and the database group. Since the fall of 2013, the system has been actively used by intensity frontier experiments. It has been scaled out dramatically - from just over 100 terabytes to over five petabytes. Currently dCache routinely delivers (both reads and writes) more than 5 million files per day and moves about 500 terabytes of data per day. This combined level of performance puts it on par with ATLAS and CMS Tier-1 sites in BNL and Fermilab.

This fiscal year, we have made our first steps towards leveraging our expertise in storage to expand our user base by launching the Active Archive Facility project (see http://archive.fnal.gov) that enables researchers access to our storage facility through the Strategic Partnership Project (SPP) mechanism. Our first major customer is the Simons Foundation Genome Diversity Project, whose participants actively use public dCache to store and access their data over the WAN.

The world map below shows distribution of dCache clients that have transferred at least one terabye of data in the last three months.

dcache_map.png
The successful operation of dCache instances at Fermilab is made possible in part by direct involvement of the Data Movement and Development group in dCache development within a framework of international collaboration between DESY, Fermilab and NDGF. Over the years, dCache software has been evolving towards embracing industry standards such as parallel NFS (pNFS) and WebDAV while maintaining and improving popular domain-specific protocols like GFTP, XRootD, SRM and dCap. In fact, dCache provides its own fully compliant implementation of XRootD server in Java. We work closely with IFDH and SAM developers who provide a set of user-friendly tools that hide protocol specifics from the end user.

Performance demands are always on the rise, and the increased user load does not always come smoothly; there are always some issues to work on. The issues with pNFS Linux client, ripple effects caused by pool nodes going offline or certain corner use cases that result in non-anticipated behavior keep us occupied.

We look forward to new challenges with optimism. The system is capable of meeting ever-increasing data throughput needs because it has been designed to be highly scalable and is able to adapt to changing load patterns.

Be sure to open a Service Desk ticket if you have problems or dCache does not work as expected. Your problem reports drive a process of continuous code improvement resulting in a better dCache product.

Dmitry Litvintsev & Gene Oleynik