Cheetah documentation

By Nadia Zatsepin

Arizona State University

Category

Series

Published on

Abstract

The official Cheetah website is http://www.desy.de/~barty/cheetah/ 

These pages will not reproduce all the content from the Cheetah site, but are meant as an addendum.

For a list of all Cheetah's keywords, click here. 

 


What is Cheetah?

Cheetah is a set of programs for processing serial diffraction data data from at free electron laser sources, and which enable taking home only the data with meaningful content.  This is a sanity saver in many serial imaging experiments. 

Cheetah is modular and can easily be adapted to any serial imaging data, including data collected using both free electron laser and synchrotron sources using a variety of detectors (including CSPAD, pnCCD, AGIPD, Pilatus, Rayonix).

The primary citation for Cheetah is:

A. Barty, R. A. Kirian, F. R. N. C. Maia, M. Hantke, C. H. Yoon, T. A. White, and H. N. Chapman, “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl Crystallogr, vol. 47, pp. 1118–1131 (2014). doi:10.1107/S1600576714007626 - Download PDF - Article on IUCr website

Please cite this paper if you have used Cheetah or a part of Cheetah in your data analysis.


Downloading, compiling and installing Cheetah

Cheetah at LCLS

Step by step instructions for using the centrally installed Cheetah at LCLS: http://www.desy.de/~barty/cheetah/Cheetah/Configuration.html 

Cheetah at CFEL/DESY

Cheetah is installed in /cfel/common. Running

$ cheetah-gui

should just work, provided /cfel/common/bin is in your PATH. 

More instructions to follow (or ask Anton Barty)

Cheetah elsewhere

At any other location you will have to install Cheetah from scratch.Installing Cheetah itself is not too hard; however installing the LCLS framework required to read XTC files directly can be an adventure.  Your mileage may vary.  Please see the developer pages for details on installing Cheetah from scratch. 

Alternatively, if your data comes from somewhere other than LCLS, Cheetah can be called from code able to read any other file format: it is simply a matter of passing the frame data to Cheetah for processing. Once again, see the developer pages for more details.

Cheetah for developers

Cheetah is open-source and has been released under the GNU GPL v3 license. The latest releases and updates Cheetah are best downloaded from the Github repository: https://github.com/antonbarty/cheetah/

Please follow the download instructions on that page: (assuming you have a version of git already installed)

> git clone git://github.com/antonbarty/cheetah.git

Please refer to the website for further details: http://www.desy.de/~barty/cheetah/Cheetah/Developers.html

Click here for Cheetah updates 


Running Cheetah

Cheetah at LCLS

The pre-installed Cheetah package at LCLS is in /reg/g/cfel/cheetah/cheetah-latest

Please follow the instructions at http://www.desy.de/~barty/cheetah/Cheetah/Cheetah_at_LCLS.html  for getting Cheetah running on your data at LCLS. Cheetah has a very handy GUI for launching batch hit finding jobs, keeping track of hit finding results, generating darkcals and bad pixel masks from them and viewing hits. 

This is the most reliable route for using Cheetah at LCLS. 

After you run tar -xvf /reg/g/cfel/cheetah/template.tar in your scratch/<username> directory, the sub-directories created include:

 

calib

Calibration files: beam (for beam files used by older versions of CrystFEL), darkcal – where you should store the darkcals created by Cheetah; gaincal – for gain calibration files; geometry – geometry files; and mask – where to store bad pixel masks, peak masks etc.

gui

Files needed by cheetah-gui. You will need to modify crawler.config  before running. Instructions on what to change are on the ‘Cheetah at LCLS’ web page.

hdf5

Output from Cheetah is saved here. HDF5’s (diffraction data + metadata) and a bunch of hit finding configuration files.  

A separate directory, rXXXX–<tag> is created for each run and each tag (so you can try different hit finding parameters without overwriting). The amount of “clean” data grows quickly, so remember to delete all but your best hit finding results when finished.

XXXX is the run number and <tag> is the name of your ini file if you launch jobs from the Cheetah GUI, or a user-specified tag when launching from a terminal using

./process <run> <inifile.ini> <tag>  

(process, the script, can be found in your cheetah/process directory).

indexing

Location for output from CrystFEL indexing launched from the Cheetah GUI. See “lys.crystfel” script in the process directory.

process

Location for hit finding configuration (.ini) files.

    • “process” sets up the environment variables and launches Cheetah
    • “psana.cfg” is the configuration file for psana, the LCLS analysis framework (C++ and python).
    • “lys.ini” an example .ini file.
    • “darkcal.ini” – an ini file for generating a dark current measurement from a “dark” run; this doesn’t need to be edited.

The Cheetah GUI

See Getting started with Cheetah for an introduction to the Cheetah GUI.  


Cheetah hit finding configuration (.ini) files

Cheetah behavior is specified by the user through a configuration file.  An example configuration file, lys.ini, is provided in cheetah/process directory. Within a configuration file is a list of “keywords” that cheetah recognizes, and the user-specified values.  There are two types of keywords; “global” keywords that affect the analysis of all data, and “detector” keywords that affect only one particular detector. 

Global keywords may be specified in the following way:

keyword = value  # comment

Note that whitespace is ignored completely, and everything following a # symbol is ignored.  Keywords are not case sensitive, and if a keyword is unrecognized by Cheetah the program will exit. Look in the log file (typically .. scratch/<username>/cheetah/hdf5/rXXXX–tag/log.txt ) for which keyword was not recognized.

Detector keywords may be grouped together.  One way to group keywords is to use forward slashes, as follows:

group1/keyword1 = value
group1/keyword2 = value
group2/keyword1 = value
group2/keyword2 = value

The labels group1 and group2 can be any word.  An alternative way to specify groups is the following:

[group1]
keyword1 = value
keyword2 = value
[group2]
keyword1 = value
keyword2 = value

Generally, the use of brackets will simply prepend the group within the brackets to all subsequent keywords.  Empty brackets are allowed, which would specify global keywords.  Detector keywords that have not been assigned a group will automatically be assigned to the “first” detector.

Cheetah will ultimately be capable of performing peak finding / hit finding on multiple detectors.  At the moment, these operations will only be performed on the first detector in the configuration file.


Most commonly adjusted keywords in cheetah.ini

Configure cheetah.ini by

    1. Selecting the right detector:  see Detectors and Geometry and ask your beamline scientist to confirm.
    2. Selecting background processing options
    3. Tuning hit finding parameters

The following are the most important keywords you’ll probably ever want to tweak – the rest can likely be left alone. To read about all of Cheetah's keywords and hit finding algorithms, click here

Detector configuration

  • geometry (geometry/cspad_pixelmap.h5)

Calibration and masks

  • darkcal (darkcal.h5)
  • badPixelmap (badpixelmap.h5)
  • peakmask (peakmask.h5)

Background subtraction

  • useRadialBackgroundSubtraction (1)
  • useSubtractPersistentBackground (0)
  • useLocalBackgroundSubtraction (0)

Hit finding

  • hitfinderADC (150)
  • hitfinderMinSNR (6)
  • hitfinderNPeaks (20)
  • hitfinderNpeaksMax (5000)
  • hitfinderMinPixCount (2)
  • hitfinderMaxPixCount (20)
  • hitfinderLocalBgRadius (2)

Tuning hit finding parameters

Optimising crystal hit finding

  • Set hitfinderADC low enough, but not too low.
  • Is there a jet streak or a bad detector region —> put it in the peak mask
  • Too many spots in the solvent ring —> increase the hitfinderSNR or hitfinderMinPix
  • Too few spots overall —> decrease hitfinderSNR and/or decrease the number of pixels per peak (depending on what you see for the spots not being found, too small, or too weak)
  • Blank frames with little noise, finding peaks all over the place —> increase hitfinderADC (which acts as a floor on the ADC threshold computed from the radial SNR profile)
  • Still stuck with too many peaks —> try restricting the radii over which hit finding is performed using hitfinderMinRes and hitfinderMaxRes (in pixels)
    • It is convenient to start a new .ini file for each type of sample.  The name of the .ini file is used by the GUI to tag runs and update the table, and ends up as the tag name on the HDF5 directories created.  Separate names helps keep separate samples apart, and makes it easy to copy/tar/grep directories based on sample name or other eperiment parameters.  This helps keep things organised. Use a symbolic link if the files are really the same. 
    • Review your output.  Often.  No analysis should ever be done completely blind.  Use the ‘Show hits” button to look at images and refine the hit finding parameters.

Optimising processing speed

  • Set nthreads to 16 (on LCLS and most other servers) or 72 on cfelsgi
  • Check I/O speed limit using ioSpeedTest
  • Turn off powder pattern creation (which skips mutex locks around summation of powder patterns)
  • Increase amount of time between calculation of running background (recalculation mutex blocks all worker threads) or turn off running background completely
  • Increase saveInterval
  • set hitfinderFastScan to 1 – it will search only the inner 16 panels (of CSPAD’s 64)

Cheetah output files

Along with diffraction hits, virtual powder patterns and statistics, all configuration files necessary to reproduce your hitfinding result are copied into each hdf5 directory.

rXXXX-detectorX-class0-sum.h5:

Virtual powder pattern from frames not considered hits, i.e. the summation of intensities in rejected frames. Unless hdf5dump=1, frames contributing to this summation are not saved individually. This is useful to see if you are missing a lot of useful diffraction (real hits). You can view these sum.h5 files in the Cheetah GUI or using CrystFEL's hdfsee.

The HDF5 contents are explained below. Try viewing hdf5 datasets ending in “corrected_sigma” as peaks show up with much higher contrast than in the sum.

rXXXX-detectorX-class1-sum.h5:

The summation of hits, i.e. virtual powder pattern from hits.

.cxi file(s)  or  data1/ (data2/…) directories, containing HDF5 files 

If saveCXI=1 (default), all hits and corresponding metadata are saved in CXI format, i.e. in a single, large HDF5 as described in https://github.com/FilipeMaia/CXI/raw/master/cxi_file_format.pdf

If you set saveCXI=0 in the .ini file, individual HDF5’s are saved in data directories of up to 1000 small HDF5 files each. HDF5 filename: LCLS_year_monthday_rXXXX_hhmmss_tttt.h5   

A short description of the HDF5 file content / structure can be found further below. 

If you ran darkcal.ini, no data1 etc directories will be created. The dark current measurement (averaged over the whole dark run) will be in a file called  cxiXXXXX-rXXXX-detectorX-darkcal.h5. Copy this to your cheetah/calib/darkcal directory (feel free to rename it, but keep track of which run it was from and which detector, if using multiple detectors), for easier reference. Update your ini files to point to the new darkcal. Always use the dark cal nearest to your sample runs. If in doubt, use a later one. 

A cxiXXXX-rXXXX.cxi file with shot-by-shot metadata will still be created even if you run a darkcal.ini. You have view its contents in hdfsee or some other HDF5 viewer, or h5dump -d <dataset name>. (Run h5dump -n <file.cxi> first to see what the datasets are, before trying to dump potentially GB's of text).

 

frames.txt:

Frames.txt contains a list of all detector readout events (hits and non-hits) with various attributes.

eventData->________                 meaning:

  1. eventName                          HDF5 filename: LCLS_year_monthday_rXXXX_hhmmss_tttt.h5   (if saveCXI=0)
  2. filename                           “---“ if non hit; “data*/filename “ if hit.
  3. stackSlice
  4. xtcFrameNumber
  5. hit
  6. powderClass                        0 = non hits; 1 = hits
  7. hitScore                                 
  8. photonEnergyeV               
  9. wavelength                         Å
  10. gmd1                               gas monitor detector 1 (for incident flux measurement)
  11. gmd2                               gas monitor detector 2 (for incident flux measurement)
  12. detector[0].detector               sum of “detectorZpvname” and “cameraLengthOffset” in .ini file
  13. energySpectrumExist                was the spectrometer in place and recorded to datastream?
  14. nPeaks                             Number of peaks found
  15. peakNpix                           total number of pixels that contribute to peaks in pattern
  16. peakTotal                          total intensity of all peak pixels
  17. peakResolution                     in pixels
  18. peakDensity
  19. pumpLaserCode                      process variable where laser trigger is recorded (evr41, evr183, LD57)
  20. pumpLaserDelay               
  21. pumpLaserOn                        trigger for pump laser experiments 
cleaned.txt

Cleaned.txt contains information about only the hits, with fewer columns than frames

 

  1. Filename                           info->eventname
  2. frameNumber                              threadNum
  3. npeaks                             info->nPeaks              
  4. nPixels                            info->peakNpix          
  5. totalIntensity                     info->peakTotal
  6. peakResolution                     info->peakResolution    (pixels)
  7. peakResolutionA                    info->peakResolutionA      
  8. peakDensity                        info->peakDensity
rXXXX-class0-log.txt   and  rXXXX-class1-log.txt  

Lists of files contibuting to rXXXX-detectorX-classX-sum.h5.

Similar to frames.txt and cleaned.txt but class0 = non hits, and class1 = hits. (so cleaned.txt and class1-log.txt will contain the same files). The columns are:

eventData->eventname, eventData->filename, eventData->stackSlice, eventData->xtcFrameNumber, eventData->hitScore, eventData->photonEnergyeV, eventData->wavelengthA, eventData->detector[0].detectorZ, eventData->gmd1, eventData->gmd2, eventData->energySpectrumExist, eventData->nPeaks, eventData->peakNpix, eventData->peakTotal, eventData->peakResolution, eventData->peakDensity, eventData->pumpLaserCode, eventData->pumpLaserDelay 

darkcal.h5

Copy of the dark current measurement specified in original.ini as darkcal.

geometry.h5

Copy of pixel map specified in the .ini file under geometry.

Log.txt

Progress of hit finding, updated at the rate set by keyword saveInterval. If hit finding has finished, a summary is appended, including total frames processed, number of hits, hit rate, average photon energy and its sigma.

bsub.log

Log from batch job submission. Look here for errors when hit finding doesn’t work. It will report misuse of keywords and other problems.

original.ini

Copy of your original ini file (renamed).

cheetah.ini

Same as original.ini  but with commented lines removed.

cheetah.out

Full list of parameters used by Cheetah for this hit finding. You can see here if any of your keyword values from cheetah.ini were overwritten automatically due to clashes.

Peakmask.h5

Copy of mask used while peak finding. See keyword peakmask.

Peaks.txt

Space separated column file with peak information. One line per peak.

frameNumber, eventName, photonEnergyEv, wavelengthA, GMD, peak_index, peak_x_raw, peak_y_raw, peak_r_assembled, peak_q, peak_resA, nPixels, totalIntensity, maxIntensity, sigmaBG, SNR

Where GMD is a gas monitoring detector (proportional to incident flux), and peak_index: is

Psana.cfg

Copy of psana.cfg from your cheetah/process directory: the configuration file for psana, the LCLS analysis framework.

Status.txt

Status of hit finding.

xtcfiles.txt

List of xtc files from this run. If you started your hit finding before all the xtc files finished writing to the offline storage, this list may be incomplete. You will not see an error in the output from Cheetah if everything runs correctly, but rerunning it at a later date will show more frames processed. In the directory where the raw data are saved, /reg/d/psdm/cxi/cxiXXXXX/xtc (XXXXX = the experimental ID with the last 2 digits corresponding to the year of the experiment), while the xtc files are being written, their names are appended with .inprogress and Cheetah deliberately excludes them until the extension is solely .xtc.


Cheetah output HDF5 contents

This is what may be referred to as "cleaned" data.

In serial femtosecond crystallography, you will typically hit each crystal only once. (Unless the crystals are large and flowing slowly). Each diffraction pattern that Cheetah finds in the raw data stream from LCLS is saved as an individual HDF5 if the keyword "saveCXI = 0". If saveCXI =1, all the hits are saved into one large HDF5 file in "CXI format" as described in  https://github.com/FilipeMaia/CXI/raw/master/cxi_file_format.pdf

 

HDF5 files are hierarchical, consisting of the groups and dataset. Groups can contain other groups and datasets, while datasets can contain multi-dimensional data (e.g. diffraction data). More on HDF5's on the HDF5 page. 

If saveCXI=0

Each HDF5 file will have the following contents (called datasets. links are allowed within an HDF5 too...)

LCLS_2013_Feb12_r0194_053144_17343.h5

/LCLS

/data

/processing

 

/detector0-EncoderValue

 

/data

 

/energySpectrum-tilt

 

/detector0-Position

 

/energySpectrum1D

 

/hitfinder

 

/detector1-EncoderValue

 

/energySpectrumCCD

 

 

/peakinfo

 

/detector1-Position

 

/energySpectrumScale

 

 

/peakinfo-assembled

 

/ebeamCharge

 

/radialAverage0

 

 

/peakinfo-raw

 

/ebeamL3Energy

 

/radialAverage1

 

/pixelmasks

 

/ebeamLTUAngX

 

/radialAverageCounter0

 

 

 

/ebeamLTUAngY

 

/radialAverageCounter1

 

 

 

/ebeamLTUPosX

 

/rawdata

 

 

 

/ebeamLTUPosY

 

/rawdata0

 

 

 

/ebeamPkCurrBC2

 

/rawdata1

 

 

 

/eventTimeString

 

 

 

 

 

/evr41

 

 

 

 

 

/f_11_ENRC

 

 

 

 

 

/f_12_ENRC

 

 

 

 

 

/f_21_ENRC

 

 

 

 

 

/f_22_ENRC

 

 

 

 

 

/fiducial

 

 

 

 

 

/machineTime

 

 

 

 

 

/phaseCavityCharge1

 

 

 

 

 

/phaseCavityCharge2

 

 

 

 

 

/phaseCavityTime1

 

 

 

 

 

/phaseCavityTime2

 

 

 

 

 

/photon_energy_eV

 

 

 

 

 

/photon_wavelength_A

 

 

 

 

This table shows the structure of the virtual powder patterns, rXXXX-detectorX-classX-sum.h5 

R0016-detector0-class1-sum.h5

/data   (group)

/data   (link)

 

/nframes

 

/correcteddata --> /data/non_assembled_detector_corrected

 

/non_assembled_detector_and_photon_corrected

 

/data --> /data/non_assembled_detector_corrected

 

/non_assembled_detector_and_photon_corrected_sigma

 

 

 

/non_assembled_detector_corrected

 

 

 

/non_assembled_detector_corrected_sigma

 

 

 

/peakpowder

 

 

 

/radial_average_detector_and_photon_corrected

 

 

 

/radial_average_detector_and_photon_corrected_sigma

 

 

 

/radial_average_detector_corrected

 

 

 

/radial_average_detector_corrected_sigma

 

 

Other 2D datasets may be created when other "savePowder" keywords are set to 1 in the ini file. 

 

If saveCXI=1 (default)

All hits and corresponding metadata are saved in CXI format in a single HDF5. The structure of this .cxi file is described in CXI stack of images from a modular detector  in the CXI format documents in www.cxidb.org.


Cheetah GUI

For LCLS users:

The Cheetah website has detained instructions for setting up your experimental directories to use the centrally installed Cheetah: http://www.desy.de/~barty/cheetah/Cheetah/Configuration.html

Scripts

Scripts to help with miscellaneous tasks while hit finding and doing preliminary analysis can be downloaded from https://www.bioxfel.org/resources/scripts

peakogram

Quickly plot a histogram of all peaks found in run. Good way to see if you have a lot of saturation and to estimate resolution from the whole run. Uses peaks.txt.

hits

A script to quickly calculate hits and hitrates from SFX experiments

stream2stats

A script to quickly calculate data quality metrics from SFX experiments

visibly_bad_mask.py

A script to generate masks (bad pixel or peak) manually. Useful for shadows, rings from substrate/sample holder. The resultant binary (0/1) hdf5 mask needs<

Tags