Cheetah documentation

Arizona State University

Data Analysis

This resource belongs to the Data Analysis group.

The official Cheetah website is http://www.desy.de/~barty/cheetah/

These pages will not reproduce all the content from the Cheetah site, but are meant as an addendum.

For a list of all Cheetah's keywords, click here.

What is Cheetah?

Cheetah is a set of programs for processing serial diffraction data data from at free electron laser sources, and which enable taking home only the data with meaningful content. This is a sanity saver in many serial imaging experiments.

Cheetah is modular and can easily be adapted to any serial imaging data, including data collected using both free electron laser and synchrotron sources using a variety of detectors (including CSPAD, pnCCD, AGIPD, Pilatus, Rayonix).

The primary citation for Cheetah is:

A. Barty, R. A. Kirian, F. R. N. C. Maia, M. Hantke, C. H. Yoon, T. A. White, and H. N. Chapman, “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl Crystallogr, vol. 47, pp. 1118–1131 (2014). doi:10.1107/S1600576714007626 - Download PDF - Article on IUCr website

Please cite this paper if you have used Cheetah or a part of Cheetah in your data analysis.

Downloading, compiling and installing Cheetah

Cheetah at LCLS

Step by step instructions for using the centrally installed Cheetah at LCLS: http://www.desy.de/~barty/cheetah/Cheetah/Configuration.html

Cheetah at CFEL/DESY

Cheetah is installed in /cfel/common. Running

$ cheetah-gui

should just work, provided /cfel/common/bin is in your PATH.

More instructions to follow (or ask Anton Barty)

Cheetah elsewhere

At any other location you will have to install Cheetah from scratch.Installing Cheetah itself is not too hard; however installing the LCLS framework required to read XTC files directly can be an adventure. Your mileage may vary. Please see the developer pages for details on installing Cheetah from scratch.

Alternatively, if your data comes from somewhere other than LCLS, Cheetah can be called from code able to read any other file format: it is simply a matter of passing the frame data to Cheetah for processing. Once again, see the developer pages for more details.

Cheetah for developers

Cheetah is open-source and has been released under the GNU GPL v3 license. The latest releases and updates Cheetah are best downloaded from the Github repository: https://github.com/antonbarty/cheetah/

Please follow the download instructions on that page: (assuming you have a version of git already installed)

> git clone git://github.com/antonbarty/cheetah.git

Please refer to the website for further details: http://www.desy.de/~barty/cheetah/Cheetah/Developers.html

Click here for Cheetah updates

Running Cheetah

Cheetah at LCLS

The pre-installed Cheetah package at LCLS is in /reg/g/cfel/cheetah/cheetah-latest

Please follow the instructions at http://www.desy.de/~barty/cheetah/Cheetah/Cheetah_at_LCLS.html for getting Cheetah running on your data at LCLS. Cheetah has a very handy GUI for launching batch hit finding jobs, keeping track of hit finding results, generating darkcals and bad pixel masks from them and viewing hits.

This is the most reliable route for using Cheetah at LCLS.

After you run tar -xvf /reg/g/cfel/cheetah/template.tar in your scratch/<username> directory, the sub-directories created include:

calib

Calibration files: beam (for beam files used by older versions of CrystFEL), darkcal – where you should store the darkcals created by Cheetah; gaincal – for gain calibration files; geometry – geometry files; and mask – where to store bad pixel masks, peak masks etc.

gui

Files needed by cheetah-gui. You will need to modify crawler.config before running. Instructions on what to change are on the ‘Cheetah at LCLS’ web page.

hdf5

Output from Cheetah is saved here. HDF5’s (diffraction data + metadata) and a bunch of hit finding configuration files.

A separate directory, rXXXX–<tag> is created for each run and each tag (so you can try different hit finding parameters without overwriting). The amount of “clean” data grows quickly, so remember to delete all but your best hit finding results when finished.

XXXX is the run number and <tag> is the name of your ini file if you launch jobs from the Cheetah GUI, or a user-specified tag when launching from a terminal using

./process <run> <inifile.ini> <tag>

(process, the script, can be found in your cheetah/process directory).

indexing

Location for output from CrystFEL indexing launched from the Cheetah GUI. See “lys.crystfel” script in the process directory.

process

Location for hit finding configuration (.ini) files.

“process” sets up the environment variables and launches Cheetah
“psana.cfg” is the configuration file for psana, the LCLS analysis framework (C++ and python).
“lys.ini” an example .ini file.
“darkcal.ini” – an ini file for generating a dark current measurement from a “dark” run; this doesn’t need to be edited.

The Cheetah GUI

See Getting started with Cheetah for an introduction to the Cheetah GUI.

Cheetah hit finding configuration (.ini) files

Cheetah behavior is specified by the user through a configuration file. An example configuration file, lys.ini, is provided in cheetah/process directory. Within a configuration file is a list of “keywords” that cheetah recognizes, and the user-specified values. There are two types of keywords; “global” keywords that affect the analysis of all data, and “detector” keywords that affect only one particular detector.

Global keywords may be specified in the following way:

keyword = value # comment

Note that whitespace is ignored completely, and everything following a # symbol is ignored. Keywords are not case sensitive, and if a keyword is unrecognized by Cheetah the program will exit. Look in the log file (typically .. scratch/<username>/cheetah/hdf5/rXXXX–tag/log.txt ) for which keyword was not recognized.

Detector keywords may be grouped together. One way to group keywords is to use forward slashes, as follows:

group1/keyword1 = value
group1/keyword2 = value
group2/keyword1 = value
group2/keyword2 = value

The labels group1 and group2 can be any word. An alternative way to specify groups is the following:

[group1]
keyword1 = value
keyword2 = value
[group2]
keyword1 = value
keyword2 = value

Generally, the use of brackets will simply prepend the group within the brackets to all subsequent keywords. Empty brackets are allowed, which would specify global keywords. Detector keywords that have not been assigned a group will automatically be assigned to the “first” detector.

Cheetah will ultimately be capable of performing peak finding / hit finding on multiple detectors. At the moment, these operations will only be performed on the first detector in the configuration file.

Most commonly adjusted keywords in cheetah.ini

Configure cheetah.ini by

Selecting the right detector: see Detectors and Geometry and ask your beamline scientist to confirm.
Selecting background processing options
Tuning hit finding parameters

The following are the most important keywords you’ll probably ever want to tweak – the rest can likely be left alone. To read about all of Cheetah's keywords and hit finding algorithms, click here.

Detector configuration

geometry (geometry/cspad_pixelmap.h5)

Calibration and masks

darkcal (darkcal.h5)
badPixelmap (badpixelmap.h5)
peakmask (peakmask.h5)

Background subtraction

useRadialBackgroundSubtraction (1)
useSubtractPersistentBackground (0)
useLocalBackgroundSubtraction (0)

Hit finding

hitfinderADC (150)
hitfinderMinSNR (6)
hitfinderNPeaks (20)
hitfinderNpeaksMax (5000)
hitfinderMinPixCount (2)
hitfinderMaxPixCount (20)
hitfinderLocalBgRadius (2)

Tuning hit finding parameters

Optimising crystal hit finding

Set hitfinderADC low enough, but not too low.
Is there a jet streak or a bad detector region —> put it in the peak mask
Too many spots in the solvent ring —> increase the hitfinderSNR or hitfinderMinPix
Too few spots overall —> decrease hitfinderSNR and/or decrease the number of pixels per peak (depending on what you see for the spots not being found, too small, or too weak)
Blank frames with little noise, finding peaks all over the place —> increase hitfinderADC (which acts as a floor on the ADC threshold computed from the radial SNR profile)
Still stuck with too many peaks —> try restricting the radii over which hit finding is performed using hitfinderMinRes and hitfinderMaxRes (in pixels)
- It is convenient to start a new .ini file for each type of sample. The name of the .ini file is used by the GUI to tag runs and update the table, and ends up as the tag name on the HDF5 directories created. Separate names helps keep separate samples apart, and makes it easy to copy/tar/grep directories based on sample name or other eperiment parameters. This helps keep things organised. Use a symbolic link if the files are really the same.
- Review your output. Often. No analysis should ever be done completely blind. Use the ‘Show hits” button to look at images and refine the hit finding parameters.

Optimising processing speed

Set nthreads to 16 (on LCLS and most other servers) or 72 on cfelsgi
Check I/O speed limit using ioSpeedTest
Turn off powder pattern creation (which skips mutex locks around summation of powder patterns)
Increase amount of time between calculation of running background (recalculation mutex blocks all worker threads) or turn off running background completely
Increase saveInterval
set hitfinderFastScan to 1 – it will search only the inner 16 panels (of CSPAD’s 64)

Cheetah output files

Along with diffraction hits, virtual powder patterns and statistics, all configuration files necessary to reproduce your hitfinding result are copied into each hdf5 directory.

rXXXX-detectorX-class0-sum.h5:

Virtual powder pattern from frames not considered hits, i.e. the summation of intensities in rejected frames. Unless hdf5dump=1, frames contributing to this summation are not saved individually. This is useful to see if you are missing a lot of useful diffraction (real hits). You can view these sum.h5 files in the Cheetah GUI or using CrystFEL's hdfsee.

The HDF5 contents are explained below. Try viewing hdf5 datasets ending in “corrected_sigma” as peaks show up with much higher contrast than in the sum.

rXXXX-detectorX-class1-sum.h5:

The summation of hits, i.e. virtual powder pattern from hits.

.cxi file(s) or data1/ (data2/…) directories, containing HDF5 files

If saveCXI=1 (default), all hits and corresponding metadata are saved in CXI format, i.e. in a single, large HDF5 as described in https://github.com/FilipeMaia/CXI/raw/master/cxi_file_format.pdf

If you set saveCXI=0 in the .ini file, individual HDF5’s are saved in data directories of up to 1000 small HDF5 files each. HDF5 filename: LCLS_year_monthday_rXXXX_hhmmss_tttt.h5

A short description of the HDF5 file content / structure can be found further below.

If you ran darkcal.ini, no data1 etc directories will be created. The dark current measurement (averaged over the whole dark run) will be in a file called cxiXXXXX-rXXXX-detectorX-darkcal.h5. Copy this to your cheetah/calib/darkcal directory (feel free to rename it, but keep track of which run it was from and which detector, if using multiple detectors), for easier reference. Update your ini files to point to the new darkcal. Always use the dark cal nearest to your sample runs. If in doubt, use a later one.

A cxiXXXX-rXXXX.cxi file with shot-by-shot metadata will still be created even if you run a darkcal.ini. You have view its contents in hdfsee or some other HDF5 viewer, or h5dump -d <dataset name>. (Run h5dump -n <file.cxi> first to see what the datasets are, before trying to dump potentially GB's of text).

frames.txt:

Frames.txt contains a list of all detector readout events (hits and non-hits) with various attributes.

eventData->________ meaning:

eventName HDF5 filename: LCLS_year_monthday_rXXXX_hhmmss_tttt.h5 (if saveCXI=0)
filename “---“ if non hit; “data*/filename “ if hit.
stackSlice
xtcFrameNumber
hit
powderClass 0 = non hits; 1 = hits
hitScore
photonEnergyeV
wavelength Å
gmd1 gas monitor detector 1 (for incident flux measurement)
gmd2 gas monitor detector 2 (for incident flux measurement)
detector[0].detector sum of “detectorZpvname” and “cameraLengthOffset” in .ini file
energySpectrumExist was the spectrometer in place and recorded to datastream?
nPeaks Number of peaks found
peakNpix total number of pixels that contribute to peaks in pattern
peakTotal total intensity of all peak pixels
peakResolution in pixels
peakDensity
pumpLaserCode process variable where laser trigger is recorded (evr41, evr183, LD57)
pumpLaserDelay
pumpLaserOn trigger for pump laser experiments

cleaned.txt

Cleaned.txt contains information about only the hits, with fewer columns than frames

Filename info->eventname
frameNumber threadNum
npeaks info->nPeaks
nPixels info->peakNpix
totalIntensity info->peakTotal
peakResolution info->peakResolution (pixels)
peakResolutionA info->peakResolutionA
peakDensity info->peakDensity

rXXXX-class0-log.txt and rXXXX-class1-log.txt

Lists of files contibuting to rXXXX-detectorX-classX-sum.h5.

Similar to frames.txt and cleaned.txt but class0 = non hits, and class1 = hits. (so cleaned.txt and class1-log.txt will contain the same files). The columns are:

eventData->eventname, eventData->filename, eventData->stackSlice, eventData->xtcFrameNumber, eventData->hitScore, eventData->photonEnergyeV, eventData->wavelengthA, eventData->detector[0].detectorZ, eventData->gmd1, eventData->gmd2, eventData->energySpectrumExist, eventData->nPeaks, eventData->peakNpix, eventData->peakTotal, eventData->peakResolution, eventData->peakDensity, eventData->pumpLaserCode, eventData->pumpLaserDelay

darkcal.h5

Copy of the dark current measurement specified in original.ini as darkcal.

geometry.h5

Copy of pixel map specified in the .ini file under geometry.

Log.txt

Progress of hit finding, updated at the rate set by keyword saveInterval. If hit finding has finished, a summary is appended, including total frames processed, number of hits, hit rate, average photon energy and its sigma.

bsub.log

Log from batch job submission. Look here for errors when hit finding doesn’t work. It will report misuse of keywords and other problems.

original.ini

Copy of your original ini file (renamed).

cheetah.ini

Same as original.ini but with commented lines removed.

cheetah.out

Full list of parameters used by Cheetah for this hit finding. You can see here if any of your keyword values from cheetah.ini were overwritten automatically due to clashes.

Peakmask.h5

Copy of mask used while peak finding. See keyword peakmask.

Peaks.txt

Space separated column file with peak information. One line per peak.

frameNumber, eventName, photonEnergyEv, wavelengthA, GMD, peak_index, peak_x_raw, peak_y_raw, peak_r_assembled, peak_q, peak_resA, nPixels, totalIntensity, maxIntensity, sigmaBG, SNR

Where GMD is a gas monitoring detector (proportional to incident flux), and peak_index: is

Psana.cfg

Copy of psana.cfg from your cheetah/process directory: the configuration file for psana, the LCLS analysis framework.

Status.txt

Status of hit finding.

xtcfiles.txt

List of xtc files from this run. If you started your hit finding before all the xtc files finished writing to the offline storage, this list may be incomplete. You will not see an error in the output from Cheetah if everything runs correctly, but rerunning it at a later date will show more frames processed. In the directory where the raw data are saved, /reg/d/psdm/cxi/cxiXXXXX/xtc (XXXXX = the experimental ID with the last 2 digits corresponding to the year of the experiment), while the xtc files are being written, their names are appended with .inprogress and Cheetah deliberately excludes them until the extension is solely .xtc.

Cheetah output HDF5 contents

This is what may be referred to as "cleaned" data.

In serial femtosecond crystallography, you will typically hit each crystal only once. (Unless the crystals are large and flowing slowly). Each diffraction pattern that Cheetah finds in the raw data stream from LCLS is saved as an individual HDF5 if the keyword "saveCXI = 0". If saveCXI =1, all the hits are saved into one large HDF5 file in "CXI format" as described in https://github.com/FilipeMaia/CXI/raw/master/cxi_file_format.pdf

HDF5 files are hierarchical, consisting of the groups and dataset. Groups can contain other groups and datasets, while datasets can contain multi-dimensional data (e.g. diffraction data). More on HDF5's on the HDF5 page.

If saveCXI=0

Each HDF5 file will have the following contents (called datasets. links are allowed within an HDF5 too...)

LCLS_2013_Feb12_r0194_053144_17343.h5
/LCLS		/data		/processing
	/detector0-EncoderValue		/data		/energySpectrum-tilt
	/detector0-Position		/energySpectrum1D		/hitfinder
	/detector1-EncoderValue		/energySpectrumCCD			/peakinfo
	/detector1-Position		/energySpectrumScale			/peakinfo-assembled
	/ebeamCharge		/radialAverage0			/peakinfo-raw
	/ebeamL3Energy		/radialAverage1		/pixelmasks
	/ebeamLTUAngX		/radialAverageCounter0
	/ebeamLTUAngY		/radialAverageCounter1
	/ebeamLTUPosX		/rawdata
	/ebeamLTUPosY		/rawdata0
	/ebeamPkCurrBC2		/rawdata1
	/eventTimeString
	/evr41
	/f_11_ENRC
	/f_12_ENRC
	/f_21_ENRC
	/f_22_ENRC
	/fiducial
	/machineTime
	/phaseCavityCharge1
	/phaseCavityCharge2
	/phaseCavityTime1
	/phaseCavityTime2
	/photon_energy_eV
	/photon_wavelength_A

This table shows the structure of the virtual powder patterns, rXXXX-detectorX-classX-sum.h5

R0016-detector0-class1-sum.h5
/data (group)		/data (link)
	/nframes		/correcteddata --> /data/non_assembled_detector_corrected
	/non_assembled_detector_and_photon_corrected		/data --> /data/non_assembled_detector_corrected
	/non_assembled_detector_and_photon_corrected_sigma
	/non_assembled_detector_corrected
	/non_assembled_detector_corrected_sigma
	/peakpowder
	/radial_average_detector_and_photon_corrected
	/radial_average_detector_and_photon_corrected_sigma
	/radial_average_detector_corrected
	/radial_average_detector_corrected_sigma

Other 2D datasets may be created when other "savePowder" keywords are set to 1 in the ini file.

If saveCXI=1 (default)

All hits and corresponding metadata are saved in CXI format in a single HDF5. The structure of this .cxi file is described in CXI stack of images from a modular detector in the CXI format documents in www.cxidb.org.

Cheetah GUI

For LCLS users:

The Cheetah website has detained instructions for setting up your experimental directories to use the centrally installed Cheetah: http://www.desy.de/~barty/cheetah/Cheetah/Configuration.html

Scripts

Scripts to help with miscellaneous tasks while hit finding and doing preliminary analysis can be downloaded from https://www.bioxfel.org/resources/scripts

peakogram

Quickly plot a histogram of all peaks found in run. Good way to see if you have a lot of saturation and to estimate resolution from the whole run. Uses peaks.txt.

hits

A script to quickly calculate hits and hitrates from SFX experiments

stream2stats

A script to quickly calculate data quality metrics from SFX experiments

visibly_bad_mask.py

A script to generate masks (bad pixel or peak) manually. Useful for shadows, rings from substrate/sample holder. The resultant binary (0/1) hdf5 mask needs<

Support

Support Options

Knowledge Base

Ask the Community

Wish List

Support Tickets

Submit a Support Ticket

Cheetah documentation

Data Analysis

See also

Category

Published on

Abstract

The official Cheetah website is http://www.desy.de/~barty/cheetah/

Downloading, compiling and installing Cheetah

Cheetah at LCLS

Cheetah at CFEL/DESY

Cheetah elsewhere

Cheetah for developers

Click here for Cheetah updates

Running Cheetah

Cheetah at LCLS

calib

gui

hdf5

indexing

process

The Cheetah GUI

See Getting started with Cheetah for an introduction to the Cheetah GUI.

Cheetah hit finding configuration (.ini) files

Most commonly adjusted keywords in cheetah.ini

Detector configuration

Calibration and masks

Background subtraction

Hit finding

Tuning hit finding parameters

Optimising crystal hit finding

Optimising processing speed

Cheetah output files

rXXXX-detectorX-class0-sum.h5:

rXXXX-detectorX-class1-sum.h5:

.cxi file(s) or data1/ (data2/…) directories, containing HDF5 files

frames.txt:

cleaned.txt

rXXXX-class0-log.txt and rXXXX-class1-log.txt

darkcal.h5

geometry.h5

Log.txt

bsub.log

original.ini

cheetah.ini

cheetah.out

Peakmask.h5

Peaks.txt

Psana.cfg

Status.txt

xtcfiles.txt

If saveCXI=0

If saveCXI=1 (default)

Cheetah GUI

For LCLS users:

Scripts

hits

stream2stats

visibly_bad_mask.py

Tags

Helpful Links

Contact

Partner Institutions

Follow Us