Best in class
Recent Open Science Grid milestones:
2015 marked the first year since the OSG’s inception that over one billion computational hours were consumed by OSG users.
Most efficient experiments on FermiGrid that used more than 500,000 hours since Dec. 1: MINOS (98.72%) and MINERvA (85.80%).
Most efficient big non-production user on FermiGrid since Dec. 1: Luri A. Oksuzian from MINOS with 98.9% efficiency.
Experiment with the most opportunistic hours on OSG between Dec. 1 and Jan. 31: Mu2e with 13,960,877 hours.
This newsletter is brought to you by:
We welcome articles you might want to submit. Please email firstname.lastname@example.org.
Previous newsletters are available here.
New sites for MicroBooNE
The MicroBooNE collaboration operates a 170 ton Liquid Argon Time Projection Chamber (LAr TPC) located on the Booster neutrino beam line at Fermilab. One of the critical steps in advancing LAr TPC based experiments is the development of pattern recognition algorithms and software to analyze the recorded data. And with that comes an increased need for computing resources.
MicroBooNE was the first Fermilab experiment to run applications at Clemson University’s OSG cluster through opportunistic access. The combination of Fermilab, university, and OSG opportunistic resources will play a critical role in the successful analysis of MicroBooNE’s data and the precision measurements of neutrino cross sections and searches for sterile neutrinos. More information
Measuring the universe -- one galaxy at a time
The Dark Energy Survey (DES) is in its third year of gathering
multi-colored digital images of large swaths of deep space on a mountaintop in Chile. The second stage of processing, where overlapping exposures are
registered and combined into deep images of the sky, and where
individual objects are carefully measured for shape, brightness and
position is demanding of compute
resources. These jobs require up to 64 GB RAM, 2 TB of scratch disk and several
linked cores, running occasionally for more than 24 hours/job.
The FIFE group and Fermilab Scientific Computing Division have played key roles in providing tools to make available these dynamic systems in an efficient, flexible and widely distributed manner. More information
helping MINERvA with offline production
of the benefits of running production jobs for several experiments is
that experience can be shared. For example:
OPOS helped NOvA create procedures for running Keepup jobs. OPOS later helped MINERvA create their procedures using that work. Setting up these jobs was easier because of the experience of the OPOS team.
helped MINOS change their workflow so they could use the sabweb tool.
OPOS can transfer this successful work to MINERvA to enable them to
begin to use samweb.
Know before you go (on the OSG)
In most situations, the FIFE Group recommends that users not specify sites explicitly when submitting jobs to OSG
locations. Occasionally, however, users want or need to send certain types of jobs only to a specific set of remote OSG sites.
The FIFE Group maintains a Wiki page containing information about each OSG site that supports FIFE experiments.
Just because a particular job's resource requests fit at a particular
site, it does not guarantee that the job will start at that site. More Information
Coming soon to Fifemon: job resource monitoring
The FIFE monitoring application, Fifemon, has a number of new features now available for testing in pre-production (https://fifemon.fnal.gov/monitor-pp/). It can be used to compare job resource requests to actual usage, to help you optimize future job resource requests and make more efficient use of the grid.
As a general rule of thumb, the higher your resource
requests, the longer it will take your jobs to start running. You can test these new features in pre-production and to provide
feedback. More Information
Optimizing job submissions
Carefully tailoring your resource requests will increase your job throughput
With partitionable slots the norm on GPGrid, it’s important to have a good understanding of resource requirements. There may be free slots that have less resources available to them than the defaults, as they are leftovers from the way the cluster was partitioned at a given moment. As an analogy, imagine going to a restaurant that is busy (lots of red dots.) If you insist on a table, you may have to wait. Unless you’re willing to sit at the bar.
The same holds true in a computing environment. If you have a workflow that consistently uses fewer resources than the default requests, it makes sense to lower your resource request to take advantage of small-sized slots that are unusable by the default request, just like sitting at the bar. By doing so your jobs will start faster. More Information
To provide feedback on any of these articles, or the FIFE notes in general, please email email@example.com
The complete material (for viewing offline) is available in the following formats:
iPad, Nook (epub format)
Kindle (mobi format)
PDF (Lots of white space, though, so please consider before printing)