5. Data reduction and hit finding

By Nadia Zatsepin1, Richard A Kirian1

1. Arizona State University



Published on


Graphical Image Analysis at LCLS: Psocake

New graphical image analysis software (Psocake) have been developed at LCLS by Chunhong Yoon. 

Psocake enables you to deal with or run the following:

Full documentation:  https://confluence.slac.stanford.edu/display/PSDM/Graphical+Image+Analysis


Psocake SFX tutorial: https://confluence.slac.stanford.edu/display/PSDM/Psocake+SFX+tutorial  

Data reduction and hit finding with Cheetah

Cheetah at LCLS

To get started with Cheetah at LCLS, follow the instructions on Anton Barty's Cheetah website. 

For more details on some topics, please refer to Cheetah documentation and the list of Cheetah keywords


New Cheetah GUI

In June 2016, the Cheetah interface was completely re-written in Python. These instructions are copied from Anton Barty's Cheetah website (CFEL, DESY): http://www.desy.de/~barty/cheetah/Cheetah/Cheetah_GUI.html  Please refer there for updates.


Cheetah is divided into two separate parts. Each part can function without the other, although they work best together.


1) A graphical user interface for interacting with data, starting analysis jobs, monitoring status and viewing output,




2) A data processing program which runs separately, ideally on a batch farm, for performing the data processing.




The purpose of the GUI is to present results in a convenient form, provide an overview of different data sets in an experiment, and send commands to the command line so you do not have to type them in manually.  Commands are echoed to the console (ready for cut-and-paste elsewhere). 


This makes it possible to:


  • Run the GUI separately at home to view data from many experiments without having to install the computing part.
  • Modify the GUI for experiments at a different facility while maintaining a familiar look and feel.
  • Start processing from the command line or from a shell script (e.g. if working on a low bandwidth connection).
  • Launch analysis or data visualization from the command line, or re-use tools such as the CXI file viewer in different ways if desired.
  • Use a different data processing backend by modifying what is sent to the command line

Cheetah Quick Start

First make sure the cheetah-gui is in your path by sourcing the appropriate setup file, then launch the GUI:


$ source /reg/g/cfel/cheetah/setup.sh (or setup.csh)

$ cheetah-gui


Note: Both of the setup scripts do not much more than add the cheetah-gui location to your path.  

cheetah-gui itself is actually a shell script which sets paths and environment variables for an Anaconda python3 installation then calls the cheetah-gui.py.  This avoids affecting the rest of your setup or causing conflicts with other Python versions.  Modification of this script to get it to work at home (or on your laptop) is trivial.

If the GUI starts without error, you will be presented with a dialog box to select which experiment to work on. All past experiments you have previously looked at are listed for convenience.  If the experiment is not already in the list it can be added using the 2nd button - simply navigate to the cheetah/gui/crawler.config file for the new experiment. Or set up a new experiment using the 3rd button.  

After this, the table of runs and processed (or unprocessed) data should appear, similar to the screen shot at the top of this page.  More detail on processing options is given on the following pages. If in doubt, follow the instructions for LCLS as this is the most tested and frequently used example. 

Note: The list of experiments is saved in the file ~/.cheetah-crawler, one line per experiment.  This file can be edited manually if needed.


cxiview - a new, improved HDF5 viewer

Cxiview is a new, much improved HDF5 viewer. It is launched from the Cheetah GUI when you click "view hits". It can open .cxi files (multiple frames per hdf5 file) or individual (small) hdf5 files. It can also be used for viewing CrystFEL's indexing progress but displaying the found peaks and predicted (indexed) peaks simultaneously by giving cxiview the stream file. 

(Running the cheetah setup scripts puts Cheetah's new HDF5 viewer in your PATH.)



Cheetah Slow Start (IDL GUI)


1. Make a directory in your scratch folder

For example

$ cd /reg/d/psdm/cxi/cxi84914/

$ mkdir <username>

2. Run the Cheetah setup script (add this to your startup scripts if you're a regular user)

This will define a few environment variables that Cheetah needs. You only need to run one of the following and it can be run from anywhere.

$ source /reg/g/cfel/cheetah/setup.csh # if you use csh or tcsh

$ source /reg/g/cfel/cheetah/setup.sh  # if you use bash

3. From within your personal scratch directory, grab the Cheetah template.

Get a fresh copy of the template for each experiment because sometimes it has useful updates that have been quietly added.

$ tar -xvf /reg/g/cfel/cheetah/template.tar

The directories and files you just created are described in the section "Cheetah at LCLS" in the Cheetah documentation. 

[OPTIONAL] Create the links to /res and /scratch by running make-labrynth. Run the following only if you know what you're doing and can figure out what other changes you may need to make. Don't do this the first time.  

$ /reg/g/cfel/cheetah/cheetah-latest/bin/make-labrynth 

4. Update the GUI scripts and see whether you can find the .xtc files and update the file list.  Instant success #1.

Edit  cheetah/gui/crawler.config with your favorite editor, e.g. nano, vi, gedit.

xtcdir should point to the location of your XTC files. hdf5dir should point to where you want the Cheetah output to go. By default let this be the cheetah/hdf5 directory expanded by the template (but it can be elsewhere). The rest can be left alone for now.

5. Modify the process script (cheetah/process/process) and check you can make a darkcal from any old run.  

(1A)  Edit the experiment name "expt" in your process script. 

(1B) Add your scratch directory name (probably your username) between scratch and cheetah for H5DIR and CONFIGDIR.  

The rest should should not need modification, and should be self-explanatory if changes need to be made (for example configuration files in ${expt}/res instead of ${expt}/scratch if you set up your directories that way).


This script is a front-end to making Cheetah run, enabling data to be processed from the command line by simply typing the one-line command

$ ./process <Run#> <cheetah.ini> <tag>

This is the command executed by the ‘Run Cheetah’ button on the front of the GUI.

(2) Launch the Cheetah GUI   (open a second tab  or new terminal, log in to psana again etc; the GUI command will repeatedly post to it)

$ cheetah-gui 

Start the GUI interface using the cheetah-gui command as described on the getting started page.


Click the IDL virtual machine box to dismiss it. (The Cheetah GUI uses the IDL virtual machine as an environment, which can be downloaded and used for free if you want to run the image browser at home.) 

You should then be presented with a list of all previous experiments.  

(3) Either select one of the existing experiments, or use ‘Different Directory’ to navigate to the location of the cheetah/gui folder described above and click OK. Select "crawler.config" before clicking OK. 

This should bring up the interface shown on the getting started page. 

(4) Click on a dark run (or any run, for testing), click "run cheetah", make sure the ini file name is darkcal.ini, click Run. 

More details in step (4) on the official Cheetah at LCLS page. Those should get you through making a dark current measurement (from any run), and making a mad pixel map from this darkcal.h5 using Tools, the Create bad pixel map from darkcal


6. Create a copy of lys.ini, update the darkcal and bad pixel masks. 

Run cheetah with your new ini file on some data run(s). See whether you find any peaks at all.  Hopefully instant success #3.  The parameters in lys.ini find diffraction spots from almost all samples; but will definitely need tuning to get the more out of your particular data. 

This step is easy if you can actually find some hits, as it quickly confirms things are working.

Debugging a data set with no hits is much harder as there's no feedback - another reason why a calibration sample is very useful.


Tuning hit finding parameters and understanding Cheetah output

Cheetah's output is created in /scratch/<username>/cheetah/hdf5 . A description of the files that Cheetah creates in your /cheetah/hdf5 directory are described in detail under "Cheetah output files" on the Cheetah documentation page. 

Cheetah behavior is specified by the user through a configuration (.ini) file, an example of which is included in the tarball (lys.ini). Within a configuration file is a list of keywords that Cheetah recognizes, and the user-specified values.

Cheetah's keywords, hit finding parameters, hints for troubleshooting are explained on the Cheetah documentation page. 


All of Cheetah's keywords explained