This README documents whatever steps are necessary to get "QtClassify" up and running.
What is this repository for?
QtClassify is a GUI that helps classify emission lines found in MUSE datacubes. The main idea is to take each detected line and guess what line it could be (and thus the redshift of the object). You would expect to see other lines that might not have been detected but are visible in the cube if you know where to look, which is why parts of the spectrum are shown where other lines are expected. In addition to that monochromatic layers of the MUSE datacube are displayed, depending on where in the spectrum you want to look at. As input you need a MUSE datacube (use a median-filter subtracted one if possible to get rid of the continuum emission) as well as a catalogue with emission lines and a signal-to-noise cube (both provided for example by LSDCat). If you want, you can also provide a broadband (HST) image for comparison, but this is not needed.
How do I get set up?
For QtClassify to run you will need python version 2.7.x and the following packages:
- numpy (tested with version 1.10.14)
- astropy (tested with version 1.0.6)
- scipy (tested with version 0.14.0)
- pyqtgraph (tested with version 0.9.10)
You can get pyqtgraph here: http://pyqtgraph.org/ Or use pip install.
And you will also need LSDCat to give you a catalogue and an S/N cube. If you are a Mac user, you will need to install Qt first. If you use Windows, get Linux.
The LSDCat catalogue will have to contain the following columns:
- X position in pixels
- Y position in pixels
- Z position in pixels
- RA in degrees
- DEC in degrees
Now you are ready to run line_classification_GUI_pyqtgraph.py which is the main programme for QtClassify. You can type "line_classification_GUI_pyqtgraph.py --help" in your command line to see how you can specify which files QtClassify should use and what the output file should be called. You will see something like this:
usage: line_classification_GUI_pyqtgraph.py [-h] [-id INPUTDATA] [-isn INPUTSN] [-c CATALOG] [-o OUTPUT] [-F FLUXHDU] [-N NOISEHDU] [-c_ID COLUMN_ID] [-c_X COLUMN_X] [-c_Y COLUMN_Y] [-c_Z COLUMN_Z] [-c_RA COLUMN_RA] [-c_DEC COLUMN_DEC] [-c_LAM COLUMN_LAM] [-c_SN COLUMN_SN] [-hst HSTIMAGE] [-v VACUUM] [-e EFF_NOISE] [-rcn REPLACECUBENANS] line_classification_GUI_pyqtgraph.py - classify lines with a GUI! optional arguments: -h, --help show this help message and exit -id INPUTDATA, --inputdata INPUTDATA Input flux cube FITS file (e.g. median filtered fluxcube). -isn INPUTSN, --inputsn INPUTSN Input S/N cube FITS file from LSDCat. -c CATALOG, --catalog CATALOG Input catalog FITS file from LSDCat. -o OUTPUT, --output OUTPUT Output catalog FITS file. -F FLUXHDU, --fluxhdu FLUXHDU HDU Number (0-indexed) of flux in input flux cube FITS file (default: 1). -N NOISEHDU, --noisehdu NOISEHDU HDU Number (0-indexed) of variance in input flux cube FITS file (default: 2). If set to -1, the noise will not be used (saves memory). -c_ID COLUMN_ID, --column_ID COLUMN_ID Column name in input FITS catalog for ID, default=ID. -c_X COLUMN_X, --column_X COLUMN_X Column name in input FITS catalog for x position (in pixel), default=X_SN. -c_Y COLUMN_Y, --column_Y COLUMN_Y Column name in input FITS catalog for y position (in pixel), default=Y_SN. -c_Z COLUMN_Z, --column_Z COLUMN_Z Column name in input FITS catalog for z position (in pixel), default=Z_SN. -c_RA COLUMN_RA, --column_RA COLUMN_RA Column name in input FITS catalog for RA (in degrees), default=RA_SN. -c_DEC COLUMN_DEC, --column_DEC COLUMN_DEC Column name in input FITS catalog for DEC (in degrees), default=DEC_SN. -c_LAM COLUMN_LAM, --column_LAM COLUMN_LAM Column name in input FITS catalog for lambda (wavelength in Angstrom), default=LAMBDA_SN. -c_SN COLUMN_SN, --column_SN COLUMN_SN Column name in input FITS catalog for maximum signal to noise for the found line (DETSN_MAX in LSDCat), default=DETSN_MAX. -hst HSTIMAGE, --hstimage HSTIMAGE HST image for comparison. -v VACUUM, --vacuum VACUUM Convert the wavelength of found lines from air to vacuum for determining the redshift. Set to 1 if you want the conversion, set to 0 if not. -e EFF_NOISE, --eff_noise EFF_NOISE Instead of using a noise cube (or the noise HDU of a datacube) it is also possible to use the 1D effective noise, which is scaled according to the aperture size. -rcn REPLACECUBENANS, --replaceCubeNaNs REPLACECUBENANS Set this keyword to "False" to ignore the memory and time consuming replacement of NaNs with 0s when loading data cubes
Don't worry, most of the parameters have sensible default values. You have to specify the first few, though, as you will definitely need -id for the input datacube, -isn for the S/N cube, -c for the input catalogue with the found emission lines and -o for the output catalogue. It might also be useful to make sure the right flux HDU (-F) and noise HDU (-N) are used from your datacube. It is also important to read the correct columns of your input catalogue, which can be specified by using the -c_ statements. The HST image is not needed, but if you want one, use -hst. The last parameter you can set is -v for vacuum. This activates the conversion between the wavelength of the found emission lines in air to vacuum to compare to the input line list. After inserting your files QtClassify should be ready to run.
Here's what you see:
The top panels show the positions in the spectrum where lines are expected. The fat line is the line that was found and is in the catalogue. The green cross gives you the position of the mouse for orientation, the grey dotted line is the zero line. The magenta line is the error spectrum in arbitrary units (in case you specified the noise HDU). The red vertical line is the position where the line was found, the grey vertical lines are for positions where one would expect other lines. The middle row of panels shows the monochromatic MUSE layers (cutouts) at the position of your mouse in the top panels (move your mouse around to see the line appear and disappear!). The bottom panels show the same position in the S/N cube to give you an idea of the detection significance. You can read what line is shown in each column under the S/N panels (and on the vertical lines). At the bottom you see the full spectrum with regions marking the positions and the spectral width of the top panels in light blue and that of the detected line in grey. The zero line is light green and positions of other possible lines are marked as magenta vertical lines. If you provided an HST image, this will be displayed on the right.
Here's what you can do:
First of all you can look at the line and the datacubes. When you move your mouse in the top panels, the layer in the cutouts changes accordingly. There are currently four options for your line guesses: Lya, OII, OIII and Ha. You can change your guess by clicking on the buttons at the top. If for any of these lines you find that you can see other lines as well, this is a strong indication that your guess was correct. You can then identify the line by using the drop-down menu at the right titled "Your identification". This will automatically compute the redshift and assign identifications to any other line found in this object.
At the bottom of the GUI there are three horizontal sliders. The first slider lets you smooth the spectrum with a Gaussian function of a certain sigma (which is what can be adjusted with the slider). This is useful if the spectrum is very noisy. The default value is 0, so no smoothing, so be sure to play around with the slider if needed. For noisy lines that might me Lya, it is good to smooth a little to see if the line is asymmetric. The second slider lets you zoom all parts of the spectrum simultaneously (see the regions in the full spectrum grow and shrink accordingly). The third slider changes the aperture size that is used for extracting the spectra. If you specified an HST image, you can see the aperture as the green circle. Changing the aperture is useful if there are multiple objects close to each other and you only want to get the spectrum of the central one. It is also useful to distinguish between Lyman alpha and for example OII, since the Lyman alpha halo of Lyman alpha emitters is usually very extended, which means the flux increases with increasing aperture. This would not be the case for other lines. There is another slider, which is vertical and at the left of the full spectrum. It lets you adjust the y-axis of the spectrum.
The drop-down menu under "Your identification" lets you identify the found line. As mentioned above, it automatically matches other found lines using the assumed redshift once you select a classification for one line. If you want to be even faster, you can click on "Auto identify!" to let QtClassify identify all objects with multiple detected lines automatically. It is strongly recommended to have a quick look at those objects anyway. If you don't want to go through all lines in one object, check the box "Show only strongest line". That way, only the line with the highest S/N will be displayed. This is useful for objects with a lot of found emission lines, such as bright galaxies, but also for detected continuum emission residuals in bright stars.
Quality and Confidence
Next to the drop-down for identification, you can also enter a quality and a confidence level. Read the tooltip (hover mouse over it) to learn what is what. Note that if QtClassify does automatic identification, it will assign quality 'a' and confidence '3' to every line in an object where all lines can be matched. If at least one of the lines in an object could not be matched automatically by QtClassify, quality and confidence are set to 'c' and '0' and the identification will read 'no match'. Check these objects again to see what is going on (there is either an interloper line that could not be identified, a line that is not in the line list that is used or the identification was not accurate). If the object only has one detected line to begin with, the quality is also automatically set to "c". If you saw additional lines that are not in the catalogue but still visible in the S/N cube for example, this makes your identification more robust and you can manually set the quality to "b".
If you are done with all lines or you don't want to go on anymore, just click the cross at the very top right corner. Your identifications will be stored in the catalogue specified by you as output. Be careful, though, as this file will be overwritten. To avoid any accidental loss of classifications, QtClassify will check if the file you set as output catalogue already exists. In this case you can either load it and continue your classification or you can rename your output catalogue and begin again. Your identifications will be saved in this output catalogue (specified with -o on the command line). The output catalogue will have the same columns and content as your input catalogue, with the addition of columns for:
- short names
Note that each time you move forward or backward in your catalogue to classify, the output catalogue will automatically be written, so there is no loss of classifications if the programme should crash.
- Use the mouse: You can zoom by using the right mouse button in one of the top panels. Using the left mouse button lets you shift the spectrum and the cutouts around.
- Use the regions: You can shift the regions in the full spectrum at the bottom to go to a different position in the spectrum. These regions can also be changed in size to zoom.
- Go back and forth: If you are satisfied and want to go on to the next line, just click on the "next" button. You can go back by clicking on "previous". Don't worry, your former identification will not be lost, though. You can see it in the "Comment", "Quality", "Identification" and "Redshift" lines. You can now make up your mind and change your identification if you want or simply go on.
- Go somewhere else: If you want to revisit an object, enter the ID in the top right line edit and click on 'Jump'. Note that this will jump to the object with the ID you entered, not the line. If you have 'Show only strongest line' activated, it will jump to this emission line. Next to the 'Jump' button there is information about the next unclassified line. This is useful if you used the automatic identification or if you stopped your classification process at some point and want to resume it again.
- Association: Next to the line edit for comments, you can put an association. It sometimes happens that a big object is split into two by LSDCat. If you notice that has happened, enter the ID of the belonging object in the association line edit and it will be saved in the output catalogue.
- Where am I? You can see the RA and DEC positions of your object at the bottom right corner. If you want to get an idea where you are in your data, simply use the HST image and zoom out.
- Redshift: You can shift the red line (which is the found line) in the top panels to the actual position of the center of the emission line (in case it was not found correctly). This position will then be used to calculate the redshift. Since this is more of a redshift-by-eye determination, it is recommended to use for example a fitting tool for a more precise redshift determination afterwards.
In the bottom right corner of QtClassify there are several line edits that let you customize the cut levels of the data that is dislayed. The first row is for the actual data, the second for the S/N image. This is especially helpful since it lets you see directly what signal-to-noise ratio the line has that you are looking at. Just change the calue and hit enter to activate the change. There is a similar line edit above the HST image, which lets you do the same. You can also invert the colours by simply switching the cut values.
You want different options in the identification drop-down menu? Your favourite line is missing? You want HeII as an initial guess at the top? Use custom_lines.py and custom_options.py. They contain simple python lists and dictionaries, which store the lines, wavelength and possible identifications. For changing which lines can be used as initial guesses, go to custom_options.py and edit 'possible_lines_names' and 'dict_other_lines' (it also has explanations on how to do that). If you want to change the wavelengths or identifications for the drop down menu under 'Your identification', go to custom_lines.py and follow the instructions therein.
Tips for Users
When opening QtClassify, be a little patient. The datacubes are probably huge. Once it is loaded it should be sufficiently fast. Make sure you used the right input and output catalogue file names as well as the correct HDU for the datacubes! When the GUI appears, first make sure you like the cuts. Next it is useful to automatically identify all the easy objects, which will save you time. Now you can go through your catalogue and have fun classifying! But be careful, cases if classification addiction have been reported. ;)
I would love to get feedback, so feel free to test the programme and tell me if you find bugs or things that could be improved. Please use the issue tracker of bitbucket and create an issue/bug report if you have suggestions or requests.
QtClassify is licensed under a
three-clause BSD license. For
details see the file
LICENSE in the QtClassify repository.