TriggerTelescopeSync command does not update Dome Azimuth Parameter sometimes causing unchecked/out-of-control rotation of Dome.

Issue #731 resolved
Stephen Bogner created an issue

Priority may be higher than “Major” since it introduces a potential safety issue that may damage equipment or cause injury.

Issue has been reproduced. (See Trace Logfile attached).

Running NINA 1.11 #028. I am using the NexDome Control System from Tigra Astronomy Driver Version 3.2 Build 3.2.0-alpha.18.

Description

After a period of the dome tracking/syncing with the telescope as expected, it suddenly goes into a protracted cycle where it will rotate continuously through 360 degrees many times over many minutes before eventually catching/correcting itself.

Steps to Reproduce

  • Open Dome Shutter.
  • Place Telescope at Home position.
  • Toggle “Dome follows telescope” in NINA Dome Interface.
  • Use Sequencer to send Telescope to target coordinates (I used NGC 891) with sidereal tracking.
  • Wait for failure.

Expected Behavior

Dome should track normally, activating every few minutes to rotate a small amount, keeping telescope pointed through the dome slit.

Actual Behavior

As demonstrated by the Log File:

  1. Dome connects normally.
  2. “Dome follows telescope” toggle results in a TriggerTelescopeSync command which immediately rotates the dome to align with the telescope, as expected. (At line 159 the first TriggerTelescopeSync corresponds with the user selecting “Dome follows Telescope” in the NINA Dome Interface, taking the dome from the Park position to align with the Telescope Home position.)
  3. Running the sequence causes the telescope to slew to target, and the dome follows as expected. At line 195 the TriggerTelescopeSync with [Message] Dome direct telescope follow slew. Current azimuth=328° 59' 13", Target azimuth=180° 47' 55", Tolerance=01° 00' 00", followed by WaitForDomeSynchronization [Message] Dome not synchronized. Waiting... The first wait takes about 14 seconds, while subsequent waits take about 1 second as the message repeats until line 319. The total duration of the slew is 46 seconds. This corresponds with the dome following the telescope to the target position, as per the sequence.
  4. The next TriggerTelescopeSync appears to be triggered by a Meridian crossing event (maybe) at line 359 [Message] Dome direct telescope follow slew. Current azimuth=180° 47' 51", Target azimuth=181° 50' 18", Tolerance=01° 00' 00". The Current and Target azimuth are more than Tolerance apart at this point. This is followed by pairs of TriggerTelescopeSync messages in the pattern [Message] Cannot synchronize with telescope while dome is slewing then [Message] Dome direct telescope follow slew. Current azimuth=180° 47' 51", Target azimuth=181° 57' 22", Tolerance=01° 00' 00". This pattern repeats from line 363 to line 407 (32 seconds) ending at line 409 [Message] Dome direct telescope follow slew. Current azimuth=180° 47' 51", Target azimuth=182° 26' 13", Tolerance=01° 00' 00". Significantly, Current Azimuth has not changed in this period (32 seconds) although Target azimuth has and has increased even more beyond Tolerance.
  5. From Line 411 to Line 429 five further TriggerTelescopeSync commands over 20 seconds are issued using the same Current Azimuth, but a changing Target azimuth.
  6. At line 431 the TriggerTelescopeSync command shows that the Current Azimuth has finally updated. At this point the dome seems to begin track normally, activating every few minutes to rotate a small amount. However, log shows many ignored TriggerTelescopeSync commands and anomalous messages stating that the dome cannot sync with the telescope while dome is slewing, even when the dome is visibly stationary between slews. (I believe that the only times the dome did in fact track/slew were on those lines immediately before the Current Azimuth updated. In other words, the times where there were [Message] Cannot synchronize with telescope while dome is slewing the dome was not actually slewing. I timed the period between a series of normal/expected tracking slews of the dome on a stopwatch and demonstrated that they were coincident with such updating Current Azimuth events in the log (See Table).
Line Time Interval (stopwatch) Comments
429 2020-12-08T19:41:43.0432 Meridian slew
489 2020-12-08T19:43:39.7404 1:56 15 commands ignored?
573 2020-12-08T19:45:42.5420 2:03 13 commands ignored? Says dome is slewing when it is not.
657 2020-12-08T19:47:45.1518 2:03 Start stopwatch. Pattern continues…
741 2020-12-08T19:49:49.6376 2:04 (2:04)
825 2020-12-08T19:51:52.1544 2:03 (2:03)
893 2020-12-08T19:53:55.0010 2:03 (2:03)
957 2020-12-08T19:55:59.6707 2:04 (2:04)
1025 2020-12-08T19:58:10.4668 2:11 (2:10)
1109 2020-12-08T20:00:21.1539 2:11 (2:11)
1201 2020-12-08T20:02:37.6897 2:16 (2:16) At this point it went out of control. I shut off “Dome follows telescope”
1229 2020-12-08T20:04:32.1052 1:55

Issues that seem to arise from the log include:

  1. Why are TriggerTelescopeSync commands ignored, such that they have to be repeated many times before being acted on? Perhaps a trace message stating the reason for not acting on the command, if it is designed, should be in the log?
  2. Why does the message “Cannot synchronize with telescope while dome is slewing” get issued when the dome is not, in fact, slewing? I note that when the dome actually is slewing (line 195-319) it issues a message from WaitForDomeSynchronization “Dome not synchronized. Waiting…”. Both messages come from the same file: E:\Projects\nina\NINA\ViewModel\Equipment\Dome\DomeFollower.cs so the inconsistency seems suspicious.

I will attempt to get another log while keeping careful track of dome rotation behavior so slewing and OOC events can be correlated with times in the log.

Comments (12)

  1. George Hilios

    I believe this is MinorAce from Discord? If so, I looked at the logs. The described behavior doesn’t add up from what I’m seeing in the logs, but I do know that the 3.2 NexDome driver had a number of issues, and 3.3 corrected many of them and included a higher precision slew which allowed for better than integer granularity.

    This is a topic I raised in the documentation, but I admit it would be hard for a Dome owner to know what their slew precision is without knowing to ask their vendor. Please upgrade to 3.3 and do let me know if this issue keeps happening. I suspect you’re hitting a bug in the driver where requesting a slew really close to the current position could send the dome rotator on a goose chase.

  2. Stephen Bogner reporter

    Yes, I took a much closer look at the logs and modified my understanding quite a bit from what I had on Discord. The account here is more sound, I think. I will upgrade to 3.3 and see if this helps. Just so I am clear, I used the NexDomeControlSystem.3.2.0-alpha.18.exe installer on 28 October 2020. This gave me the Rotator-3.3.0.hex and Shutter-3.3.0.hex files as well as the Rotator-3.4.0-alpha.9.hex and Shutter-3.4.0-alpha.hex files with the NexDome-Firmware/Uploader/TA.NexDome.FirwareUpdater.exe file. I had already updated both the rotator and shutter to 3.3.0 firmware. Are you recommending that I upgrade to 3.4.0-alpha, or is there an even more recent version to use?

  3. Stephen Bogner reporter

    I have gone ahead and captured another log file this afternoon, using my existing configuration so this would be a record for comparison for after updating to 3.4.0-alpha. The runaway behavior was repeated several times in this log, and no proper tracking occurred at all during this test. I also grabbed a video of the screen while one of the runaways occurred where we can see what was displayed in NINA and compare with what was in the logs.

    1. In the TA.Nexdome.Server-2020-12-08.log I see the message: "DelimitedMessageStrings"[1]: "OnNext"(":FRR3.4.0-alpha.9#"). Am I correct that this means I am already running the 3.4.0-alpha version of the NexDome driver?
    2. The new log for NINA can be found at: https://1drv.ms/u/s!AlSF_aJSkaKVuzEjrrCsljUeIQMM?e=DbzFCi
    3. The first TriggerTelescopeSync command at 2020-12-14T15:35:13.1606 moves the dome from the park position to align with the telescope home position, as usual.
    4. At [2020-12-14T15:36:19.5492] we see the sequence directed slew to target which completes successfully at [2020-12-14T15:36:59.4384].
    5. At [2020-12-14T15:38:06.4582] we see what should have been a normal tracking slew turn into an uncontrolled slew that lasts 50 seconds until [2020-12-14T15:38:56.8479].
    6. At [2020-12-14T15:40:19.1140] we see another presumed tracking slew that turns into an uncontrolled slew that lasts about 2 minutes.
    7. At [2020-12-14T15:45:22.3666] we see another uncontrolled slew that lasts for just over 3:15 minutes, which was captured on the following video: https://photos.app.goo.gl/xh4UfijrEZUtvBbc7 . In the video you can read the azimuth being reported in the Dome and Telescope tabs of the NINA Imaging interface, as well as the local system time. I would say that the azimuths are correct, although the update rate seems pretty slow and makes large jumps. If this update rate is indicative of how the dome positions are being updated in the internals of the program this could be an issue. (I would want to confirm that the sampling rate is high enough to allow the system to be controlled. I was also surprised that obvious overshoots evident in the data updating in the NINA interface vs. the issued TriggerTelescopeSync command, did not cause the dome to reverse direction and home in on the commanded position, as I would have expected.)
    8. At this point I felt that we had enough data and parked the telescope and dome. The dome struggled to understand its location and had to be manually parked, even after running the “Home” routine. It seems that it might have lost its understanding of where it was during the proceedings…

    Hopefully this illustrates the behavior, even though very little (nothing, really) is being written into the log during the OOC events themselves.

  4. George Hilios

    Thank you for the detailed timeline! What’s going on is very perplexing. My immediate recommendation to you is to increase the threshold from 1 to 2 degrees. I believe there may be a bug in the driver (probably firmware) that results in a slew that goes the long way around when there’s a very short distance to travel (say, just over 1 degree).

    A few observations/comments:

    1. The way versioning is reported by NexDome is apparently quite weird. You have the 3.4 driver but the ASCOM driver says 3.2. This is actually newer than the one I have! Mine says 3.1. I’m guessing I have a 3.1 ASCOM driver and a 3.2 firmware. Perhaps consider downgrading to see if we can isolate it to a driver issue?
    2. Can you report this to Tim Long? I can support you as well. What seems to be happening is that NINA is issuing a slew over very short distances and the rotator goes the other way all the way around. The log has precise values as reported by the driver, so these might be good test cases for Tim to reproduce. I’m reasonably sure that there’s a driver bug here, and that there’s little NINA can do except be less aggressive about synchronizing, which is why I recommend increasing threshold to 2
    3. I can try upgrading my dome driver soon too so I can experience these issues firsthand
    4. Whenever you see “TriggerTelescopeSync”, NINA immediately sends a driver slew request at the reported azimuth to the driver. At that point, it is in the driver’s hands. I happen to know that if the NexDome is slewing in any manner (rotator or shutter), then it ignores slew requests until it has stopped moving. It may also ignore requests if it think it has nothing to do (say if it is too close already). When you see this message repeatedly, that is NINA checking every 2 seconds or so for current to be far enough away from target to issue a move. When there’s a gap between messages (like you periodically see 2 minute gaps), that happens when NINA is still waiting for a previous slew request it initiated to complete. Lastly, the final message you see something about “Dome is still slewing” can happy if something other than the dome follower initiated a rotation or a shutter open/close.

  5. Stephen Bogner reporter

    Thanks George for looking at this. I have now changed the tolerance to 2, as you recommended. I am running a test as I write this, and it is now behaving very well, and the log file is very clean. The dome is rotating almost exactly 2 degrees every 5 minutes, approximately, which is exactly what I would hope to see. It has done this for 8 increments so far, so I am going to leave it running for a couple of hours, including a meridian flip, to see if any issues come up. It will be very nice if something this simple fixes the issue for me, although it does leave the question of what exactly is happening with the tighter tolerance of 1 degree to cause such erratic behavior. I will let you know how I make out on the longer test. Cheers!

  6. Stephen Bogner reporter

    I have completed the longer test, which came off with no issues. The log file is very clean. Here is a link to the log, in case anyone has a similar issue and would like to see what the log ought to look like when things are working: https://1drv.ms/u/s!Aiw9JNFztT21ik5dN2Y_CWiMd6QD?e=AGx22s

    Again, the fix was to use a larger tolerance of 2 degrees, rather than the 1 degree default. It is not clear why having a tighter tolerance created the erratic behavior, but this was the simple fix for me. (So far….) Thanks George for the rescue!

  7. George Hilios

    I’m glad this works for you now! Thanks again for your patience and meticulousness. To help ensure others don’t run into this, I’ll update the default to 2 degrees. I’ll also contact Tim Long and see what he thinks.

  8. Stephen Bogner reporter

    Of course increasing the tolerance would have been too easy…. My imaging session last night was again plagued by dome runaways, even after I upped the tolerance even more to 3 degrees. The factor that was different was the presence of other equipment and also the target at a higher elevation than where I had run the daytime test that tracked successfully. So, it seems that my next test will have to be running tracking at different elevations from the same azimuth to see if there is a transition from normal operation to erratic operation as a function of the target altitude.

  9. George Hilios

    @Stephen Bogner - I filed a Nexdome issue here. I’m hoping Tim’s involvement will help!

    Were you able to downgrade the drivers (including the firmware) to see if this still reproduces?

  10. Stephen Bogner reporter

    I have not downgraded the drivers yet, since I wanted to characterize the failure a little better to give Tim a fighting chance with it. What I have done is run through a test matrix to demonstrate Reproducibility/Repeatability, and also to investigate whether or not the issue was related to target elevation calculations. What I have found is that with the Tolerance set at 2.0 I was able to run successfully at all elevations from 45 to 85 degrees through runs of 30 minutes. (I did experience a couple of equipment communications failures, but these were not related to the issue I was experiencing before.) However, when I changed the Tolerance to 1.0 I experienced failure immediately during the first run of 135 degrees azimuth and 55 degrees elevation. This failure was reproduced/repeated 3 times in succession, with Out-of-Control (OOC) dome rotation arising from the first attempt to track in each case, and no successful tracking. On these runs I shut things down and started over after 3 consecutive OOC incidents, rather than let it continue to fail for 30 minutes.. Interestingly, when I then reset to a Tolerance of 2.0 and tried to rerun 135 Azimuth and 55 Elevation - without taking the whole system to ground and restarting - I also experienced an immediate failure. Is it possible that the failure condition is perhaps causing memory corruption in the driver or the stepper motor controller that is not being flushed, leading to continuing/cascading failures even when rerunning sequences that had run successfully before?

    I want to try this again, but unfortunately there now appears to be a problem with the stepper motor controller itself which seems to have lost its mind, as per this video: https://1drv.ms/v/s!Aiw9JNFztT21ilCjwheVyZKedbT7?e=A9IdRz . I have a support request in with Babak, so I will see if he knows what is causing this and how to sort it out. I prefer to deal with one failure at a time, when possible….

    Anyway, these tests seem to have confirmed that the failure is associated with the Tolerance, and is not related to the target elevation, and that the failure can “poison” future sequences/runs somehow.

    Here is the simple NINA sequence that I ran for each of these tests, obviously just substituting the relevant Azimuth and Elevation.

  11. Log in to comment