Wiki
EmuMore / when_to_emulate
When to emulate?
So you have your nifty numerical simulator that captures all the details of the complex device you want to model, everything! You got your digital twin, and for a while everything is OK.
Beyond reality
There comes the day in which you want to use your simulator in ways that go beyond the constraints imposed by the physical world ... well, that is the reason why we build simulators, isn't it?
<! more >
You want to:
 optimize the design of your device
 find out the values of parameters of your device that are hard or impossible to measure (i.e. system identification)
 use the model to predict the behavior of your device under different situations and control it in realtime.
At this point you realize that your nifty model is just too slow. Maybe a single simulation is many times faster than reality, lets say a simulation of your device running for months takes a few minutes, but applications 1.3. require thousands of simulations, and even a runtime of a few minutes make those applications impractical.
So, what do you do? First, do not panic, you are not alone. This problem is quite common and there is a whole field of applied science that deals with it. In a nutshell: you need to speedup your model by making it specific to the problem at hand.
Problem specification
To speedup your simulator you need to check a few properties of the usage you have planned for it. In particular you want to check if you actually need all the details that your simulator provides. For example, a simulator based on the Finite Element Method (FEM) will produce a huge amount of values that you might not need to actually get the information you need for your application.
Loosely speaking, numerical simulators used in applications like the ones enumerated in the previous section have a fanout/fainin structure. Let an image explain this idea:
The fanout part comes from the way inputs, e.g. boundary conditions, actuations, parameter values, etc., are mapped to internal states of the simulator (usually many ... many!). Continuing with the FEM example, a few parameters defining the problem will produce hundreds or thousands of nodal values. Your simulator spends precious time keeping track of this plethora of values.
The fanin part is given by the application at hand. Usually the application does not need the whole set of internal states, e.g. you might be interested in the signal produced by a sensor obtained from the average of states over a small region of your domain. In such (pervasive!) situations the quantity of output values is much smaller than the dimension (order) of the internal states of the simulator. Your simulator, by the way, is establishing a relation between the inputs and the outputs of your problem.
The natural question is then: can we find a smaller model that establishes the same (or a similar) relation between inputs and outputs? And the answer is generally yes, such reduced model exists.
Reduced models
The general way of reducing a model, and well suited for models based on differential equations (we will mainly refer to these), is via Model Order Reduction (MOR). MOR methodologies use information from your simulator and a set of simulations (snapshots) to create a reduced model that behaves very much like your detailed simulator.
MOR methodologies based on projections build a model with a structure derived from the structure of your simulator and with parameters adapted to reproduce the outputs of your simulator:
These methodologies give you some information about the internal states of the detailed simulator, but "blurred out". It is like looking at a picture with your eyes ajar: you still see the picture, but textures and patterns with small details are blurred out (for the physicist out there: it is a kind of coarsegraining). The reduced model, having much fewer states and capturing the most important aspects of the inputoutput relation, spends less time in internal calculations and offers a speedup.
Ey! Why not we take this to the extreme and get rid of all internal states? Enter emulation...
Emulators
Emulation takes MOR to its natural limit and uses only snapshots of your simulations to learn the inputoutput relation.
Emulation is a regression problem, even more, it is an interpolation and falls within the realm of Machine Learning and Scattered Data Approximation^{2}. Emulation will give you the highest speedup attainable, but it is not always feasible.
If emulation exploits information from your simulator it is called mechanistic or invasive, and if it does not then is called datadriven (see this article^{1} for a discussion). Having information about the simulator that generated the snapshots helps guessing the inputoutput relation far away from the given data.
Datadriven emulation is simpler to apply than mechanistic emulation and MOR. That is why you should try it first of all.
Recap
MOR and emulation, the extreme case, can help you with your speed problem. But there is no free lunch: to create any of these faster models, you will still need to run some simulations and then apply methods for the creation of the effective model.
Datadriven emulation uses no information about your simulator (besides the data), therefore the error of its predictions for inputs far away from the data is uncontrollable. Mechanistic emulation migth produce better trends outside the data scope, since it uses some information about the simulator structure. This is also the case with MOR.
An important difference between MOR and Emulation is that the former allows you to represent the (blurred) internal states of your simulator, while the latter does not.
Flowchart
Preconditions

do I need a faster model?
 How much time is needed to generate a single simulation?
 How many simulations does my application require?
 Do not confuse intrinsic speed with implementation speed. One should look at complexity of the simulator and of the selected emulation/interpolation method.

Is my problem suited for emulation?
 Do I need the whole simulation state? Emulation vs. MOR.
 Is there an observer? is dimension reduced (fanin, specially along time)?.
First decisions
 Do I have a computer cluster at hand? Will a cluster be available in the field?
 Are there effective theories or pheomenological models for my problem? Fit those to simulated data or use them to generate residuals.
So we emulate
 Dimension of the input space
 Can we get rid of the time dimension? Spectral methods
 If time is not reducible. Can we use iterative interpolation? KFGP equivalence.
 Dimension of the output space
 Is the output scalar?
 Is the output a vector? Is there interaction among the components?
 Time budget
 Is my problem suited for vectorized/parallel evaluation?
References

Carbajal, J. P., Leitão, J. P., Albert, C., & Rieckermann, J. (2017). Appraisal of datadriven and mechanistic emulators of nonlinear simulators: The case of hydrodynamic urban drainage models. Environmental Modelling & Software, 92, 17–27. http://doi.org/10.1016/j.envsoft.2017.02.006 ;↩

Holger Wendland (2005). Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics 17. ↩
Updated