Skip Ribbon Commands
Skip to main content

Skip Navigation LinksHome

FIFE Notes - February 2016 News for Distributed Computing at Fermilab

Screen Shot 2015-10-14 at 1.10.21 PM.png

Best in class

This newsletter is brought to you by:

  • Ken Herner
  • Bo Jayatilaka
  • Mike Kirby
  • Katherine Lato
  • Tanya Levshina
  • Anna Mazzacane
  • Kevin Retzke
  • Brian P. Yanny

We welcome articles you might want to submit. Please email fife-group@fnal.gov.

Previous newsletters are available here.


New sites for MicroBooNE

The MicroBooNE collaboration operates a 170 ton Liquid Argon Time Projection Chamber (LAr TPC) located on the Booster neutrino beam line at Fermilab. One of the critical steps in advancing LAr TPC based experiments is the development of pattern recognition algorithms and software to analyze the recorded data. And with that comes an increased need for computing resources.

run3472_subrun63_event3172_col_small.png

MicroBooNE was the first Fermilab experiment to run applications at Clemson University’s OSG cluster through opportunistic access. The combination of Fermilab, university, and OSG opportunistic resources will play a critical role in the successful analysis of MicroBooNE’s data and the precision measurements of neutrino cross sections and searches for sterile neutrinos.  More information

__________

Measuring the universe -- one galaxy at a time

The Dark Energy Survey (DES) is in its third year of gathering multi-colored digital images of large swaths of deep space on a mountaintop in Chile. The second stage of processing, where overlapping exposures are registered and combined into deep images of the sky, and where individual objects are carefully measured for shape, brightness and position is demanding of compute resources. These jobs require up to 64 GB RAM, 2 TB of scratch disk and several linked cores, running occasionally for more than 24 hours/job.

Figure 2.jpeg

The FIFE group and Fermilab Scientific Computing Division have played key roles in providing tools to make available these dynamic systems in an efficient, flexible and widely distributed manner. More information

__________

OPOS helping MINERvA with offline production

One of the benefits of running production jobs for several experiments is that experience can be shared. For example:

  • OPOS helped NOvA create procedures for running Keepup jobs. OPOS later helped MINERvA create their procedures using that work. Setting up these jobs was easier because of the experience of the OPOS team.

  • OPOS helped MINOS change their workflow so they could use the sabweb tool. OPOS can transfer this successful work to MINERvA to enable them to begin to use samweb.

More information

Know before you go (on the OSG)

In most situations, the FIFE Group recommends that users not specify sites explicitly when submitting jobs to OSG locations. Occasionally, however, users want or need to send certain types of jobs only to a specific set of remote OSG sites. The FIFE Group maintains a Wiki page containing information about each OSG site that supports FIFE experiments. Just because a particular job's resource requests fit at a particular site, it does not guarantee that the job will start at that site. More Information

__________

Coming soon to Fifemon: job resource monitoring

The FIFE monitoring application, Fifemon, has a number of new features now available for testing in pre-production (https://fifemon.fnal.gov/monitor-pp/). It can be used to compare job resource requests to actual usage, to help you optimize future job resource requests and make more efficient use of the grid.

fifemon_cluster_summary.png

As a general rule of thumb, the higher your resource requests, the longer it will take your jobs to start running. You can test these new features in pre-production and to provide feedback. More Information

__________

Optimizing job submissions

Carefully tailoring your resource requests will increase your job throughput


With partitionable slots the norm on GPGrid, it’s important to have a good understanding of resource requirements. There may be free slots that have less resources available to them than the defaults, as they are leftovers from the way the cluster was partitioned at a given moment. As an analogy, imagine going to a restaurant that is busy (lots of red dots.) If you insist on a table, you may have to wait. Unless you’re willing to sit at the bar.

Screen Shot 2016-02-02 at 4.09.01 PM.png

The same holds true in a computing environment. If you have a workflow that consistently uses fewer resources than the default requests, it makes sense to lower your resource request to take advantage of small-sized slots that are unusable by the default request, just like sitting at the bar. By doing so your jobs will start faster. More Information

To provide feedback on any of these articles, or the FIFE notes in general, please email fife-group@fnal.gov

The complete material (for viewing offline) is available in the following formats: