revid: session control API redesign

Issue #29 on hold

kortschak created an issue 2019-01-17

Currently, the revid.Revid session API is spread over two packages (a main, revid-cli, and revid). These leads to confusion about how to use this type. I suspect that is has also contributed to the unsafe approach to concurrency that currently exists in revid-cli.

I would like to see the API changed to simplify the control of revid sessions. The simplest approach is similar to the approach currently used, but removes the need for the user to stop the session to reconfigure it.

The API would look something like the exec.Cmd/child process communication. Since signals don't exist between goroutines, there would need to be a way to send information to the running (or not) session.

Briefly:

// Start starts a revid session, returning an error is that has failed.
func (r *Revid) Start() error

// Kill terminates a revid session. A killed revid session may not be restarted.
func (r *Revid) Kill() error

// Configure sets the configuration of a session. It is valid to call Configure
// on an unstarted session.
func (r *Revid) Configure(c Config) error

There is a pause mode in the current implementation, but it does not pause the input stream. There needs to be some thinking about this.

Comments (24)

Saxon Milton
This redesign sgtm. I was wondering if a context.Context would be useful in this situation to stop routines when kill is called ?

In regards to working out how to pause the input stream, could we just kill raspivid when we call Kill(), and then detect this occurrence in our input func ? Alternatively, can we just have our lexing work done as a routine, and if we're concerned about errors, set up an errChannel that we can watch on the revid-cli side of things, or with setupInput?
- 2019-01-19T08:02:58+00:00
Saxon Milton
@kortschak
- 2019-01-20T09:38:17+00:00
kortschak reporter
Pause can be readily done with sending SIGSTOP (and resumed with SIGCONT). Killing the process would lead to an EOF in the internal consumers that should then close the chain. Using posix signals won't work on windows, but I imagine that if we care enough about that we could find out how to do this ([here] is a discussion of it - I don't care enough about windows to do this, so I would just stub that out for the windows build).

Edit: and of cource raspivid won't be a source on windows.
- 2019-01-20T09:52:19+00:00
Saxon Milton
@kortschak @scruzin something like this does sound good, but there's no raspivid on windows, this means that we can't test using webcam on windows either right ?

Also, what are your thoughts on how we can set up the lexing to work in the background ? You mention that you didn't like when I had "go lexTo..." - what's the way around something like this ?
- 2019-01-20T11:06:13+00:00
kortschak reporter
lexTo returns an error, you can't just discard that.

There needs to be a goroutine that handles that and that then waits and returns an error. Otherwise, there needs to be a channel for the error state to be retrieved.
- 2019-01-20T11:26:56+00:00
Saxon Milton
@kortschak @scruzin is there any issue with just killing raspivid and detecting this, rather than using the posix signals, so that we can use a camera stream both on linux and windows ?

Also, what are your thoughts on the use of context.Context to kill the routines ?
- 2019-01-20T11:36:34+00:00
Saxon Milton
@kortschak @scruzin I personally consider this proposal to be of high priority as revid-cli is broken without it.
- 2019-01-21T08:11:14+00:00
Alan Noble
+1 for the use of contexts.
- 2019-01-21T08:58:19+00:00
Saxon Milton
- assigned issue to
  
  AusOcean developer
- 2019-01-21T12:01:54+00:00
Saxon Milton
@kortschak ping
- 2019-01-22T11:09:03+00:00
kortschak reporter
context.Context is not necessary for this.

Starting raspivid precludes windows, so why do we care whether we are using something that doesn't build on windows? It just means that part of the code must be protected by a linux build tag.
- 2019-01-22T12:15:17+00:00
Saxon Milton
If we have a routine doing lexing, and then a routine for outputting clips - what is a nice way of stopping them on kill ? If context.Context doesn't work, what's the alternative?

on windows, we might like to use webcam or similar - in this case, there is still a process that needs to be killed.
- 2019-01-22T12:41:05+00:00
Saxon Milton
@kortschak ping
- 2019-01-23T04:44:46+00:00
Alan Noble
I would like revid to remain as portable as possible. Providing we keep non-portable functionality compartmentalised as external binaries, such as raspivid, there is little need for OS-specific build flag. FFmpeg runs on many different operating systems, including Windows - only the path is different.

I think revid-cli should become a very small shim, with configuration and everything else, including NetSender functionality, moving into Revid.

Also, although slightly off thread, but related to interfaces, I would like to see a "RevidCamera" abstraction layer for raspivid, so we could read from the built-in camera on any operating system, or provide hooks for such an implementation, whether Ubuntu or Windows (has its Integrated Camera Driver).
- 2019-01-24T03:20:08+00:00
kortschak reporter
The way to stop a goroutine on a signal is to have a kill chan in a select and return from the goroutine when the kill chan is closed.

I would like to understand the benefit from having revid run on windows (or any non-posix device). Requiring that it operate under that OS makes the integration of control between child processes and the Go executable very much harder. It is easy if there are things that are currently apparently desired (pausing the camera for example) are removed or done inefficiently (kill and restart the camera process).
- 2019-01-24T03:41:31+00:00
Alan Noble
Also, as a principle all worker tasks should run in their own go routine (or "thread") leaving the main routine ("thread") to handle control requests (whether originating from signals or via NetReceiver network requests). This approach minimises the risk of an errant worker routine causing the whole program to freeze, which is an important consideration given the remote locations where our software will be deployed.
- 2019-01-24T04:03:50+00:00
Alan Noble
Our comments passed mid flight.

Re benefit of portability, the chief rationale is that (over time) we (and others) will want to use revid to consume video data from many different devices on many different platforms. For example, in the not-to-distant future, we will want to ingest the H.264 video data collected by RoVs and re-encode it as MTS inserting our meta data in the process. This data often originates from Windows devices.

Kill chans is definitely an approach, but if workers all run in their own routines, using a Waitgroup seems more elegant. Although some channel is still required presumably to instruct the routine to exit.
- 2019-01-24T04:14:33+00:00
kortschak reporter
Waitgroups do not work for control. They implement the wait after completion. In order for a goroutine to be conditionally exited in a race-safe way you need some kind of semaphore. This can be done with a condition variable in a way that is difficult to reason about safely or via a chan and select which is easy to reason about. Waitgroups don't satisfy either of these.
- 2019-01-24T04:28:42+00:00
Alan Noble
OK, nix waitgroups. I still think each worker task should run in its own go routine though.
- 2019-01-24T04:42:59+00:00
kortschak reporter
With a single server: https://play.golang.org/p/JK7ptyI_0yh (chan but no WG - just for demonstration)

Multiple servers: https://play.golang.org/p/2sxZROC47df (using chan and WG for completion wait - the WG should be used in the first one too, but the example there should that it's not intrinsically required, just for safety).
- 2019-01-24T04:52:38+00:00
Saxon Milton
I propose that we leave alone the current method of dealing with raspivid for now, and concentrate on getting the routines working. This can include the renaming of the start and stop methods, to start and kill. I propose we do as has been discussed regarding the the signal chan and waitgroups. Specifically, there will be two routines, one dealing with input and the other dealing with output. This will at least stop revid from being broken and actually make it stoppable remotely. We can think about dealing with errors later i.e. implementing an error chan, or if we like, we can handle in this PR, but really I think we should prioritise on fixing revid.
- 2019-01-24T05:26:26+00:00
kortschak reporter
I think pausing can be implemented using pipe back pressure. With a scheme similar to this https://play.golang.org/p/HVU27PzRvP8
- 2019-01-24T05:46:49+00:00
kortschak reporter
Extending the approach above, there needs to be a context.Context (or something similar) to handle to error return. Then the call to lexTo can be made a goroutine.

Looking at context.Context and the associated extension functions, I don't think we can use the context.WithCancel if we want pausing, but we can extend context.Context so that it has an the necessary details, essentially including the mechanics that are in the code above.

This architecture strongly suggests that Alan's desire to have all the code in revid.Revid and almost none in the calling executable is the right way to go. Doing anything esle massively expands the API surface required to have all these behaviours.
- 2019-01-24T20:31:39+00:00
Saxon Milton
- changed status to on hold
API has changed massively since this discussion and does not currently require refactor of this nature, however, some of this may still be useful for future improvements; considered to be on hold for now.
- 2020-05-12T09:45:03+00:00
Log in to comment

Assignee: AusOcean developer

Type: proposal

Priority: major

Status: on hold

Votes: 0

Watchers: 3