Commits

Nico Schlömer committed a028a14 Merge

Merged fenics-project/dolfin into master

Comments (0)

Files changed (54)

demo/documented/biharmonic/common.txt

 Equation and problem definition
 -------------------------------
 
-The biharmonic equation is a fourth-order elliptic equation. On the domain
-:math:`\Omega \subset \mathbb{R}^{d}`, :math:`1 \le d \le 3`, it reads
+The biharmonic equation is a fourth-order elliptic equation. On the
+domain :math:`\Omega \subset \mathbb{R}^{d}`, :math:`1 \le d \le 3`,
+it reads
 
 .. math::
    \nabla^{4} u = f \quad {\rm in} \ \Omega,
 
-where :math:`\nabla^{4} \equiv \nabla^{2} \nabla^{2}` is the biharmonic
-operator and :math:`f` is a prescribed source term. To formulate a complete
-boundary value problem, the biharmonic equation must be complemented by
-suitable boundary conditions.
+where :math:`\nabla^{4} \equiv \nabla^{2} \nabla^{2}` is the
+biharmonic operator and :math:`f` is a prescribed source term. To
+formulate a complete boundary value problem, the biharmonic equation
+must be complemented by suitable boundary conditions.
 
 Multiplying the biharmonic equation by a test function and integrating
 by parts twice leads to a problem second-order derivatives, which
 discontinuous Galerkin approach to impose continuity of the normal
 derivative weakly.
 
-Consider a triangulation :math:`\mathcal{T}` of the domain :math:`\Omega`,
-where the union of interior facets is denoted by :math:`\Gamma`.
-Functions evaluated on opposite sides of a facet are indicated by the
-subscripts ':math:`+`' and ':math:`-`'.
-Using the standard continuous Lagrange finite element space
+Consider a triangulation :math:`\mathcal{T}` of the domain
+:math:`\Omega`, where the union of interior facets is denoted by
+:math:`\Gamma`.  Functions evaluated on opposite sides of a facet are
+indicated by the subscripts ':math:`+`' and ':math:`-`'.  Using the
+standard continuous Lagrange finite element space
 
 .. math::
     V_ = \left\{v \in H^{1}_{0}(\Omega): \ v \in P_{k}(K) \ \forall \ K \in \mathcal{T} \right\}
    u            &= 0 \quad {\rm on} \ \partial\Omega \\
    \nabla^{2} u &= 0 \quad {\rm on} \ \partial\Omega
 
-a weak formulation of the biharmonic reads: find :math:`u \in V` such that
+a weak formulation of the biharmonic reads: find :math:`u \in V` such
+that
 
 .. math::
    \sum_{K \in \mathcal{T}} \int_{K} \nabla^{2} u \nabla^{2} v \, dx \
   + \int_{\Gamma} \frac{\alpha}{h}[\!\![ \nabla u ]\!\!]  [\!\![ \nabla v ]\!\!]  \, ds
   = \int_{\Omega} vf \, dx \quad \forall \ v \in V
 
-where :math:`\left< u \right> = (1/2) (u_{+} + u_{-})`, :math:`[\!\![  w
-]\!\!]  = w_{+} \cdot n_{+} + w_{-} \cdot n_{-}`, :math:`\alpha \ge 0`
-is a penalty term and :math:`h` is a measure of the cell size.  For the
-implementation, it is useful to identify the bilinear form
+where :math:`\left< u \right> = (1/2) (u_{+} + u_{-})`, :math:`[\!\![
+w ]\!\!]  = w_{+} \cdot n_{+} + w_{-} \cdot n_{-}`, :math:`\alpha \ge
+0` is a penalty term and :math:`h` is a measure of the cell size.  For
+the implementation, it is useful to identify the bilinear form
 
 .. math::
    a(u, v) = \sum_{K \in \mathcal{T}} \int_{K} \nabla^{2} u \nabla^{2} v \, dx \

demo/documented/biharmonic/cpp/documentation.rst

 Implementation
 --------------
 
-The implementation is split in two files, a form file containing the definition
-of the variational forms expressed in UFL and the solver which is implemented
-in a C++ file.
+The implementation is split in two files, a form file containing the
+definition of the variational forms expressed in UFL and the solver
+which is implemented in a C++ file.
 
 Running this demo requires the files: :download:`main.cpp`,
 :download:`Biharmonic.ufl` and :download:`CMakeLists.txt`.
     # Elements
     element = FiniteElement("Lagrange", triangle, 2)
 
-On the space ``element``, trial and test functions, and the source term
-are defined:
+On the space ``element``, trial and test functions, and the source
+term are defined:
 
 .. code-block:: python
 
 C++ program
 ^^^^^^^^^^^
 
-The DOLFIN interface and the code generated from the UFL input is included,
-and the DOLFIN namespace is used:
+The DOLFIN interface and the code generated from the UFL input is
+included, and the DOLFIN namespace is used:
 
 .. code-block:: c++
 
 
     void eval(Array<double>& values, const Array<double>& x) const
     {
-    values[0] = 4.0*std::pow(DOLFIN_PI, 4)*
-      std::sin(DOLFIN_PI*x[0])*std::sin(DOLFIN_PI*x[1]);
+      values[0] = 4.0*std::pow(DOLFIN_PI, 4)*
+        std::sin(DOLFIN_PI*x[0])*std::sin(DOLFIN_PI*x[1]);
     }
 
   };
 
-A boundary subdomain is defined, which in this case is the entire boundary:
+A boundary subdomain is defined, which in this case is the entire
+boundary:
 
 .. code-block:: c++
 
   class DirichletBoundary : public SubDomain
   {
     bool inside(const Array<double>& x, bool on_boundary) const
-    {
-      return on_boundary;
-    }
+    { return on_boundary; }
   };
 
-The main part of the program is begun, and a mesh is created with 32 vertices
-in each direction:
+The main part of the program is begun, and a mesh is created with 32
+vertices in each direction:
 
 .. code-block:: c++
 
   int main()
   {
+    // Make mesh ghosted for evaluation of DG terms
+    parameters["ghost_mode"] = "shared_facet";
+
     // Create mesh
     UnitSquareMesh mesh(32, 32);
 
-
 The source function, a function for the cell size and the penalty term
 are declared:
 
     Source f;
     Constant alpha(8.0);
 
-A function space object, which is defined in the generated code, is created:
+A function space object, which is defined in the generated code, is
+created:
 
 .. code-block:: c++
 
     Biharmonic::FunctionSpace V(mesh);
 
 The Dirichlet boundary condition on :math:`u` is constructed by
-defining a :cpp:class:`Constant` which is equal to zero, defining the boundary
-(``DirichletBoundary``), and using these, together with ``V``, to create
-``bc``:
+defining a :cpp:class:`Constant` which is equal to zero, defining the
+boundary (``DirichletBoundary``), and using these, together with
+``V``, to create ``bc``:
 
 .. code-block:: c++
 
     DirichletBoundary boundary;
     DirichletBC bc(V, u0, boundary);
 
-Using the function space ``V``, the bilinear and linear forms
-are created, and function are attached:
+Using the function space ``V``, the bilinear and linear forms are
+created, and function are attached:
 
 .. code-block:: c++
 

demo/documented/biharmonic/cpp/main.cpp

 class DirichletBoundary : public SubDomain
 {
   bool inside(const Array<double>& x, bool on_boundary) const
-  {
-    return on_boundary;
-  }
+  { return on_boundary; }
 };
 
 int main()
 {
+  // Make mesh ghosted for evaluation of DG terms
+  parameters["ghost_mode"] = "shared_facet";
+
   // Create mesh
   UnitSquareMesh mesh(32, 32);
 

demo/documented/biharmonic/python/demo_biharmonic.py

 parameters["form_compiler"]["cpp_optimize"] = True
 parameters["form_compiler"]["optimize"] = True
 
+# Make mesh ghosted for evaluation of DG terms
+parameters["ghost_mode"] = "shared_facet"
+
 # Create mesh and define function space
 mesh = UnitSquareMesh(32, 32)
 V = FunctionSpace(mesh, "CG", 2)

demo/documented/biharmonic/python/documentation.rst

 
 .. code-block:: python
 
+    # Make mesh ghosted for evaluation of DG terms
+    parameters["ghost_mode"] = "shared_facet"
+
     # Create mesh and define function space
     mesh = UnitSquareMesh(32, 32)
     V = FunctionSpace(mesh, "CG", 2)
 
 A subclass of :py:class:`SubDomain <dolfin.cpp.SubDomain>`,
 ``DirichletBoundary`` is created for later defining the boundary of
-the domian:
+the domain:
 
 .. code-block:: python
 

demo/undocumented/compiled-extension-module/python/Probe/Probe.h

 
 namespace dolfin
 {
-
   class Cell;
   class FiniteElement;
   class Function;

demo/undocumented/dg-advection-diffusion/cpp/main.cpp

 
 int main(int argc, char *argv[])
 {
+  // FIXME: Make mesh ghosted
+  parameters["ghost_mode"] = "shared_facet";
+
   // Read simple velocity field (-1.0, -0.4) defined on a 64x64 unit square
   // mesh and a quadratic vector Lagrange element
 

demo/undocumented/dg-advection-diffusion/python/demo_dg-advection-diffusion.py

 
 from dolfin import *
 
+# FIXME: Make mesh ghosted
+parameters["ghost_mode"] = "shared_facet"
+
 class DirichletBoundary(SubDomain):
     def inside(self, x, on_boundary):
         return abs(x[0] - 1.0) < DOLFIN_EPS and on_boundary

demo/undocumented/dg-poisson/cpp/main.cpp

     { return on_boundary and near(x[1]*(1 - x[1]), 0); }
   };
 
+
+  // FIXME: Make mesh ghosted
+  parameters["ghost_mode"] = "shared_facet";
+
   // Create mesh
   UnitSquareMesh mesh(24, 24);
 

demo/undocumented/dg-poisson/python/demo_dg-poisson.py

 
 from dolfin import *
 
+# FIXME: Make mesh ghosted
+parameters["ghost_mode"] = "shared_facet"
+
 # Define class marking Dirichlet boundary (x = 0 or x = 1)
 class DirichletBoundary(SubDomain):
   def inside(self, x, on_boundary):
 # Compute solution
 u = Function(V)
 solve(a == L, u)
+print "Solution vector norm (0): {!r}".format(u.vector().norm("l2"))
 
 # Project solution to piecewise linears
-u_proj = project(u)
+#u_proj = project(u)
 
 # Save solution to file
-file = File("poisson.pvd")
-file << u_proj
+#file = File("poisson.pvd")
+#file << u_proj
 
 # Plot solution
-plot(u_proj, interactive=True)
+#plot(u_proj, interactive=True)

demo/undocumented/ghost-mesh/python/demo-ghostmesh.py

+#!/usr/bin/python
+#
+# very rough demo to test out ghost cells
+# run with mpirun
+#
+from dolfin import *
+import matplotlib.pyplot as plt
+from matplotlib.collections import PolyCollection
+import matplotlib as mpl
+import numpy as np
+import sys
+
+parameters["ghost_mode"] = "shared_vertex"
+#parameters["ghost_mode"] = "shared_facet"
+# parameters["ghost_mode"] = "None"
+parameters["reorder_cells_gps"] = True
+parameters["reorder_vertices_gps"] = True
+
+n = 0
+
+if(len(sys.argv) == 2):
+    try:
+        n = int(sys.argv[1])
+    except:
+        n = 0
+
+if(MPI.size(mpi_comm_world()) == 1):
+    print "Only works with MPI"
+    quit()
+
+mpi_rank = MPI.rank(mpi_comm_world())
+
+#parameters["mesh_partitioner"] = "ParMETIS"
+
+mesh = UnitSquareMesh(8, 8)
+# mesh = refine(M)
+
+shared_vertices = mesh.topology().shared_entities(0).keys()
+shared_cells = mesh.topology().shared_entities(mesh.topology().dim())
+
+num_regular_vertices = mesh.topology().ghost_offset(0)
+
+ghost_vertices = range(num_regular_vertices, mesh.topology().size(0))
+
+verts_note = []
+if (n == 0):
+    for k,val in mesh.topology().shared_entities(0).iteritems():
+        vtx = Vertex(mesh, k)
+        verts_note.append( (vtx.point().x(), vtx.point().y(), " "+str(val)) )
+elif (n == 1):
+    for i in range(mesh.num_vertices()):
+        vtx = Vertex(mesh, i)
+        val = vtx.global_index()
+        verts_note.append( (vtx.point().x(), vtx.point().y(), " "+str(val)) )
+else:
+    for i in range(mesh.num_vertices()):
+        vtx = Vertex(mesh, i)
+        val = vtx.index()
+        verts_note.append( (vtx.point().x(), vtx.point().y(), " "+str(val)) )
+
+x,y = mesh.coordinates().transpose()
+
+process_number = MPI.rank(mesh.mpi_comm())
+
+cell_ownership = np.ones(mesh.num_cells(),dtype='int')*process_number
+cell_owner = mesh.topology().cell_owner()
+if len(cell_owner) > 0 :
+    cell_ownership[-len(cell_owner):] = cell_owner
+
+cells_store=[]
+cells_note=[]
+colors=[]
+cmap=['red', 'green', 'yellow', 'purple', 'pink', 'grey', 'blue', 'brown']
+
+idx = 0
+for c in cells(mesh, "all"):
+    xc=[]
+    yc=[]
+    for v in vertices(c):
+        xc.append(v.point().x())
+        yc.append(v.point().y())
+    xavg = c.midpoint().x()
+    yavg = c.midpoint().y()
+    cell_str=str(c.index())
+#    if c.index() in shared_cells.keys():
+#        cell_str = str(shared_cells[c.index()])
+#    else:
+#        cell_str = str(c.index())
+    cells_note.append((xavg, yavg, cell_str))
+    cells_store.append(zip(xc,yc))
+
+    colors.append(cmap[cell_ownership[c.index()]])
+    idx += 1
+
+num_regular_facets = mesh.topology().ghost_offset(1)
+facet_note = []
+shared_facets = mesh.topology().shared_entities(1)
+for f in facets(mesh, "all"):
+    if (f.num_global_entities(2) == 2):
+        color='#ffff88'
+    else:
+        color='#ff88ff'
+    if (not f.is_ghost()):
+        if (f.num_global_entities(2) == 2):
+            color='#ffff00'
+        else:
+            color='#ff00ff'
+
+    if (n < 3):
+        facet_note.append((f.midpoint().x(), f.midpoint().y(), f.global_index(), color))
+    elif (n == 3):
+        facet_note.append((f.midpoint().x(), f.midpoint().y(), f.index(), color))
+    else:
+        if (f.index() in shared_facets.keys()):
+            facet_note.append((f.midpoint().x(), f.midpoint().y(), shared_facets[f.index()], color))
+
+fig, ax = plt.subplots()
+
+# Make the collection and add it to the plot.
+coll = PolyCollection(cells_store, facecolors=colors, edgecolors='#cccccc')
+ax.add_collection(coll)
+
+plt.plot(x, y, marker='o', color='black', linestyle='none')
+plt.plot(x[shared_vertices], y[shared_vertices], marker='o', color='green', linestyle='none')
+plt.plot(x[ghost_vertices], y[ghost_vertices], marker='o', color='yellow', linestyle='none')
+
+xlim = ax.get_xlim()
+ylim = ax.get_ylim()
+
+plt.xlim((xlim[0] - 0.1, xlim[1] + 0.1))
+plt.ylim((ylim[0] - 0.1, ylim[1] + 0.1))
+
+for note in cells_note:
+    plt.text(note[0], note[1], note[2], verticalalignment='center',
+             horizontalalignment='center', size=8)
+
+for note in verts_note:
+    plt.text(note[0], note[1], note[2], size=8, verticalalignment='center')
+
+for note in facet_note:
+    plt.text(note[0], note[1], note[2], size=8, verticalalignment='center', backgroundcolor=note[3])
+
+# Q = FacetFunction("double", mesh)
+
+# xdmf = File("a.xdmf")
+# xdmf << Q
+
+plt.show()

dolfin/common/Set.h

     /// Create empty set
     Set() {}
 
-    /// Wrap std::vectpr as a set. Contents will be erased.
+    /// Wrap std::vector as a set. Contents will be erased.
     Set(std::vector<T>& x) : _x(x)
     { _x.clear(); }
 

dolfin/fem/Assembler.cpp

 // Modified by Joachim B Haga 2012
 // Modified by Martin Alnaes 2013-2014
 
-
 #include <dolfin/log/dolfin_log.h>
 #include <dolfin/common/Timer.h>
 #include <dolfin/parameter/GlobalParameters.h>
     if (!integral)
       continue;
 
+    // Check that cell is not a ghost
+    dolfin_assert(!cell->is_ghost());
+
     // Update to current cell
     cell->get_cell_data(ufc_cell);
     cell->get_vertex_coordinates(vertex_coordinates);
     if (values && ufc.form.rank() == 0)
       (*values)[cell->index()] = ufc.A[0];
     else
-      add_to_global_tensor(A, ufc.A, dofs);
+      A.add_local(ufc.A.data(), dofs);
 
     p++;
   }
     dolfin_assert(facet->num_entities(D) == 1);
     Cell mesh_cell(mesh, facet->entities(D)[0]);
 
+    // Check that cell is not a ghost
+    dolfin_assert(!mesh_cell.is_ghost());
+
     // Get local index of facet with respect to the cell
     const std::size_t local_facet = mesh_cell.index(*facet);
 
                               ufc_cell.orientation);
 
     // Add entries to global tensor
-    add_to_global_tensor(A, ufc.A, dofs);
+    A.add_local(ufc.A.data(), dofs);
 
     p++;
   }
   if (!ufc.form.has_interior_facet_integrals())
     return;
 
-  not_working_in_parallel("Assembly over interior facets");
-
   // Set timer
   Timer timer("Assemble interior facets");
 
   // Extract mesh and coefficients
   const Mesh& mesh = a.mesh();
 
+  // MPI rank
+  const int my_mpi_rank = MPI::rank(mesh.mpi_comm());
+
   // Form rank
   const std::size_t form_rank = ufc.form.rank();
 
              mesh.num_facets());
   for (FacetIterator facet(mesh); !facet.end(); ++facet)
   {
-    // Only consider interior facets
-    if (facet->exterior())
-    {
-      p++;
+    if (facet->num_entities(D) == 1)
       continue;
-    }
+
+    // Check that facet is not a ghost
+    dolfin_assert(!facet->is_ghost());
 
     // Get integral for sub domain (if any)
     if (use_domains)
       continue;
 
     // Get cells incident with facet
-    std::pair<const Cell, const Cell>
-      cells = facet->adjacent_cells(facet_orientation);
-    const Cell& cell0 = cells.first;
-    const Cell& cell1 = cells.second;
+    //std::pair<const Cell, const Cell>
+    //  cells = facet->adjacent_cells(facet_orientation);
+    //const Cell& cell0 = cells.first;
+    //const Cell& cell1 = cells.second;
+    dolfin_assert(facet->num_entities(D) == 2);
+    const Cell cell0(mesh, facet->entities(D)[0]);
+    const Cell cell1(mesh, facet->entities(D)[1]);
 
     // Get local index of facet with respect to each cell
     std::size_t local_facet0 = cell0.index(*facet);
                               ufc_cell[0].orientation,
                               ufc_cell[1].orientation);
 
+    if (cell0.is_ghost() != cell1.is_ghost())
+    {
+      int ghost_rank = -1;
+      if (cell0.is_ghost())
+        ghost_rank = cell0.owner();
+      else
+        ghost_rank = cell1.owner();
+
+      dolfin_assert(my_mpi_rank != ghost_rank);
+      dolfin_assert(ghost_rank != -1);
+      if (ghost_rank < my_mpi_rank)
+        continue;
+      //++counter1;
+      //for (std::size_t i = 0; i < ufc.macro_A.size(); ++i)
+      //  ufc.macro_A[i] *= 0.5;
+    }
+
     // Add entries to global tensor
-    add_to_global_tensor(A, ufc.macro_A, macro_dof_ptrs);
+    A.add_local(ufc.macro_A.data(), macro_dof_ptrs);
 
     p++;
   }
 }
 //-----------------------------------------------------------------------------
-void Assembler::add_to_global_tensor(GenericTensor& A,
-                                     std::vector<double>& cell_tensor,
-                                     std::vector<const std::vector<dolfin::la_index>* >& dofs)
-{
-  A.add_local(&cell_tensor[0], dofs);
-}
-//-----------------------------------------------------------------------------

dolfin/fem/Assembler.h

                                   const MeshFunction<std::size_t>* domains,
                                   std::vector<double>* values);
 
-  protected:
-
-    /// Add cell tensor to global tensor. Hook to allow the SymmetricAssembler
-    /// to split the cell tensor into symmetric/antisymmetric parts.
-    void add_to_global_tensor(GenericTensor& A,
-                              std::vector<double>& cell_tensor,
-                              std::vector<const std::vector<dolfin::la_index>* >& dofs);
-
   };
 
 }

dolfin/fem/AssemblerBase.cpp

           = dofmaps[i]->local_to_global_unowned();
         tensor_layout->local_to_global_map[i].resize(local_size
                                                   + bs*local_to_global_unowned.size());
-        for (std::size_t j = 0; j < tensor_layout->local_to_global_map[i].size(); ++j)
-          tensor_layout->local_to_global_map[i][j] = dofmaps[i]->local_to_global_index(j);
+        for (std::size_t j = 0;
+             j < tensor_layout->local_to_global_map[i].size(); ++j)
+        {
+          tensor_layout->local_to_global_map[i][j]
+            = dofmaps[i]->local_to_global_index(j);
+        }
       }
     }
 
     }
 
     // unique_ptr deletes its object when it exits its scope
-    std::unique_ptr<ufc::finite_element> fe(a.ufc_form()->create_finite_element(i + a.rank()));
+    std::unique_ptr<ufc::finite_element>
+      fe(a.ufc_form()->create_finite_element(i + a.rank()));
 
-    // Checks out-commented since they only work for Functions, not Expressions
+    // Checks out-commented since they only work for Functions, not
+    // Expressions
     const std::size_t r = coefficients[i]->value_rank();
     const std::size_t fe_r = fe->value_rank();
     if (fe_r != r)
   // Check that the cell dimension matches the mesh dimension
   if (a.rank() + a.ufc_form()->num_coefficients() > 0)
   {
-    std::unique_ptr<ufc::finite_element> element(a.ufc_form()->create_finite_element(0));
+    std::unique_ptr<ufc::finite_element>
+      element(a.ufc_form()->create_finite_element(0));
     dolfin_assert(element);
-    if (mesh.type().cell_type() == CellType::interval && element->cell_shape() != ufc::interval)
+    if (mesh.type().cell_type() == CellType::interval && element->cell_shape()
+        != ufc::interval)
     {
       dolfin_error("AssemblerBase.cpp",
                    "assemble form",
                    "Mesh cell type (intervals) does not match cell type of form");
     }
-    if (mesh.type().cell_type() == CellType::triangle && element->cell_shape() != ufc::triangle)
+    if (mesh.type().cell_type() == CellType::triangle && element->cell_shape()
+        != ufc::triangle)
     {
       dolfin_error("AssemblerBase.cpp",
                    "assemble form",
                    "Mesh cell type (triangles) does not match cell type of form");
     }
-    if (mesh.type().cell_type() == CellType::tetrahedron && element->cell_shape() != ufc::tetrahedron)
+    if (mesh.type().cell_type() == CellType::tetrahedron
+        && element->cell_shape() != ufc::tetrahedron)
     {
       dolfin_error("AssemblerBase.cpp",
                    "assemble form",

dolfin/fem/DofMapBuilder.cpp

     global_nodes0 = remapped_global_nodes;
   }
 
+  /*
+  int num_cells = 0;
+  for (CellIterator c(mesh); !c.end(); ++c)
+  {
+    if (!c->is_ghost())
+      ++num_cells;
+  }
+  std::cout << "Num cells: " << MPI::rank(MPI_COMM_WORLD) << ", "
+            << mesh.num_cells() << ", " << mesh.size_global(2) << std::endl;
+  */
+
   // Re-order and switch to local indexing in dofmap when distributed
   // for process locality and set local_range
   if (reorder)
   {
-    // Mark boundary nodes. Boundary nodes are assigned a random
-    // positive integer, interior nodes are marked as -1.
-    std::vector<int> boundary_nodes;
-    compute_boundary_nodes(boundary_nodes, node_graph0,
-                           node_local_to_global0.size(), *ufc_node_dofmap,
-                           mesh, MPI::rank(mesh.mpi_comm()));
-
-    // Compute
+    // Mark shared nodes. Boundary nodes are assigned a random
+    // positive integer, interior nodes are marked as -1, interior
+    // nodes in ghost layer of other processes are marked -2, and
+    // ghost nodes are marked as -3
+    std::vector<int> shared_nodes;
+    compute_shared_nodes(shared_nodes, node_graph0,
+                         node_local_to_global0.size(), *ufc_node_dofmap,
+                         mesh, MPI::rank(mesh.mpi_comm()));
+    /*
+    int tmp_num_owned_nodes = 0;
+    for (std::size_t i = 0; i < shared_nodes.size(); ++i)
+    {
+      if (shared_nodes[i] == -1 or shared_nodes[i] == -2)
+        ++tmp_num_owned_nodes;
+    }
+    std::cout << "Num owned nodes: " << MPI::rank(MPI_COMM_WORLD) << ", "
+              << tmp_num_owned_nodes << std::endl;
+
+    if (MPI::rank(mesh.mpi_comm()) == 0)
+    {
+      for (std::size_t i = 0; i < node_local_to_global0.size(); ++i)
+      {
+        std::cout << "l-to-g: " << i << ", " << node_local_to_global0[i]
+                  << std::endl;
+      }
+
+      //for (std::size_t i = 0; i < shared_nodes.size(); ++i)
+      //{
+      //  std::cout << "Shared node: " << node_local_to_global0[i]
+      //            << ", "  << shared_nodes[i] << std::endl;
+      //}
+      std::cout << "Local size: " << node_local_to_global0.size() << std::endl;
+      for (CellIterator c(mesh); !c.end(); ++c)
+      {
+        std::cout << "Index: " << c->is_ghost() << ", " << c->global_index()
+                  << std::endl;
+        std::cout << c->midpoint().str(true) << std::endl;
+      }
+    }
+    */
+
+    // Compute:
     // (a) owned and shared nodes (and owned and un-owned):
     //    -1: unowned, 0: owned and shared, 1: owned and not shared;
     // (b) map from shared node to sharing processes; and
                                shared_node_to_processes0,
                                dofmap._neighbours,
                                node_graph0,
-                               boundary_nodes, global_nodes0,
+                               shared_nodes, global_nodes0,
                                node_local_to_global0, mesh);
+    //std::cout << "Share noded to proc size: " << MPI::rank(mesh.mpi_comm())
+    //          << ", " << shared_node_to_processes0.size() << std::endl;
+    //for (std::size_t i = 0; i < node_ownership0.size(); ++i)
+    //{
+    //  std::cout << "Node ownership: " << MPI::rank(mesh.mpi_comm()) << ", "
+    //            << i << ", "  << node_ownership0[i] << std::endl;
+    //}
 
     // Set global offset for dofs owned by this process, and the local
     // ownership size
     dofmap._local_ownership_size = bs*num_owned_nodes;
 
     // Sanity check
+    //std::cout << "Local (sum), global dof size: "
+    //          << MPI::sum(mesh.mpi_comm(), dofmap._local_ownership_size)
+    //          << ", " << dofmap._global_dimension << std::endl;
     dolfin_assert(MPI::sum(mesh.mpi_comm(),
                            (std::size_t) dofmap._local_ownership_size)
                   == dofmap._global_dimension);
   else
     sub_dofmap._ufc_local_to_local.clear();
 
-  std::cout << "Size of local-to-local: " << parent_dofmap._ufc_local_to_local.size() << std::endl;
+  //std::cout << "Size of local-to-local: "
+  //          << parent_dofmap._ufc_local_to_local.size() << std::endl;
   if (parent_dofmap._ufc_local_to_local.empty())
   {
     dolfin_error("DofMapBuilder.cpp",
   dofmap.resize(mesh.num_cells(),
                 std::vector<la_index>(ufc_dofmap.local_dimension()));
   std::vector<std::size_t> dof_holder(ufc_dofmap.local_dimension());
-  for (CellIterator cell(mesh); !cell.end(); ++cell)
+  for (CellIterator cell(mesh, "all"); !cell.end(); ++cell)
   {
     const std::size_t cell_index = cell->index();
     ufc_cell.orientation = cell->mesh().cell_orientations()[cell_index];
   std::unordered_map<int, std::vector<int>>& shared_node_to_processes,
   std::set<int>& neighbours,
   const std::vector<std::vector<la_index>>& dofmap,
-  const std::vector<int>& boundary_nodes,
+  const std::vector<int>& shared_nodes,
   const std::set<std::size_t>& global_nodes,
   const std::vector<std::size_t>& local_to_global,
   const Mesh& mesh)
   // Global-to-local node map for nodes on boundary
   std::map<std::size_t, int> global_to_local;
 
-  // Initialise node ownership array, assuming all nodes are owned and
-  // not shared (1)
+  // Initialise node ownership array, provisionally all owned
   node_ownership.resize(num_nodes_local);
+  std::fill(node_ownership.begin(), node_ownership.end(), 1);
 
   // Communication buffer
   std::vector<std::size_t> send_buffer;
 
   // Loop over nodes and buffer nodes on process boundaries
+  std::vector<bool> boundary_node(num_nodes_local, false);
   for (std::size_t i = 0; i < num_nodes_local; ++i)
   {
-    if (boundary_nodes[i] >= 0)
+    if (shared_nodes[i] >= 0)
     {
+      // Possibly shared node
+
       // Mark as provisionally owned and shared (0)
       node_ownership[i] = 0;
+      boundary_node[i] = true;
 
       // Buffer global index and 'vote'
       send_buffer.push_back(local_to_global[i]);
-      send_buffer.push_back(boundary_nodes[i]);
+      send_buffer.push_back(shared_nodes[i]);
+
+      // Add to (global node)-to-(local node) map
       global_to_local.insert(std::make_pair(local_to_global[i], i));
     }
+    else if (shared_nodes[i] == -2)
+    {
+      // Owned, but will need to send index to other processes
+
+      // Mark as owned
+      node_ownership[i] = 0;
+
+      // Buffer global index and trivial 'vote'
+      //send_buffer.push_back(local_to_global[i]);
+      //send_buffer.push_back(-10);
+    }
+    else if (shared_nodes[i] == -3)
+    {
+      // Ghost, need to get index from other process
+      //node_ownership[i] = -1;
+      node_ownership[i] = -1;
+    }
     else
       node_ownership[i] = 1;
   }
         const int received_node_local = it->second;
 
         // If received node is shared, decide ownership
-        if (node_ownership[received_node_local] == 0)
+        if (boundary_node[received_node_local] and node_ownership[received_node_local] == 0)
         {
-          // Move dofs with higher ownership votes from shared to shared
-          // but not owned
-          if (received_vote < boundary_nodes[received_node_local])
+          // Let process with lower 'vote' take ownership
+          if (received_vote < shared_nodes[received_node_local])
+          {
             node_ownership[received_node_local] = -1;
-          else if (received_vote == boundary_nodes[received_node_local]
+          }
+          else if (received_vote == shared_nodes[received_node_local]
                    && process_number > src)
           {
             // If votes are equal, let lower rank process take ownership
         }
         else if (node_ownership[received_node_local] == -1)
         {
+          // FIXME: check above 'else if'
           // Store the process sharing of the node
           shared_node_to_processes[received_node_local].push_back(src);
         }
   }
   else
     dofmaps[0] = ufc_dofmap;
-  offset_local[block_size] = ufc_dofmap->global_dimension(num_mesh_entities_local);
+
+  offset_local[block_size]
+    = ufc_dofmap->global_dimension(num_mesh_entities_local);
 
   num_mesh_entities_global = num_mesh_entities_global_unconstrained;
 
   node_local_to_global.resize(offset_local[1]);
 
   // Build dofmaps from ufc::dofmap
-  for (CellIterator cell(mesh); !cell.end(); ++cell)
+  for (CellIterator cell(mesh, "all"); !cell.end(); ++cell)
   {
     // Set cell orientation
     const int cell_orientation
     ufc_cell_local.entity_indices[D][0] = cell->index();
     ufc_cell_local.index = cell->index();
 
+    /*
+    if (MPI::rank(MPI_COMM_WORLD) == 0)
+    {
+      std::cout << "Local cell: " << ufc_cell_local.entity_indices[D][0]
+                << ", " << ufc_cell_local.index << std::endl;
+    }
+    */
+
     // Get global cell entities/data (global)
     cell->get_cell_data(ufc_cell_global);
     cell->get_cell_topology(ufc_cell_global);
 
+    /*
+    if (MPI::rank(MPI_COMM_WORLD) == 0)
+    {
+      std::cout << "Global cell: " << ufc_cell_global.entity_indices[D][0]
+                << ", " << ufc_cell_global.index << std::endl;
+    }
+    */
+
     // Get reference to container for cell dofs
     std::vector<la_index>& cell_nodes = node_dofmap[cell->index()];
     cell_nodes.resize(local_dim);
   node_local_to_global.resize(offset_local[1]);
 
   // Build dofmaps from ufc::dofmap
-  for (CellIterator cell(mesh); !cell.end(); ++cell)
+  for (CellIterator cell(mesh, "all"); !cell.end(); ++cell)
   {
     // Get reference to container for cell dofs
     std::vector<la_index>& cell_nodes = node_dofmap[cell->index()];
   std::vector<std::size_t> node_local_to_global_mod(offset_local[1]);
   node_ufc_local_to_local.resize(offset_local[1]);
   int counter = 0;
-  for (CellIterator cell(mesh); !cell.end(); ++cell)
+  for (CellIterator cell(mesh, "all"); !cell.end(); ++cell)
   {
     // Get nodes (local) on cell
     std::vector<la_index>& cell_nodes = node_dofmap[cell->index()];
 }
 //-----------------------------------------------------------------------------
 void
-DofMapBuilder::compute_boundary_nodes(std::vector<int>& boundary_nodes,
-                                      const std::vector<std::vector<la_index>>& node_dofmap,
-                                      const std::size_t num_nodes_local,
-                                      const ufc::dofmap& ufc_dofmap,
-                                      const Mesh& mesh,
-                                      const std::size_t seed)
+DofMapBuilder::compute_shared_nodes(std::vector<int>& shared_nodes,
+                                    const std::vector<std::vector<la_index>>& node_dofmap,
+                                    const std::size_t num_nodes_local,
+                                    const ufc::dofmap& ufc_dofmap,
+                                    const Mesh& mesh,
+                                    const std::size_t seed)
 {
   // Initialise mesh
   const std::size_t D = mesh.topology().dim();
   mesh.init(D - 1);
   mesh.init(D - 1, D);
 
-  // Allocate data and initialise all facets to -1
-  boundary_nodes.resize(num_nodes_local);
-  std::fill(boundary_nodes.begin(), boundary_nodes.end(), -1);
+  // Allocate data and initialise all facets to -1 (provisionally,
+  // owned and not shared)
+  shared_nodes.resize(num_nodes_local);
+  std::fill(shared_nodes.begin(), shared_nodes.end(), -1);
 
-  // Create a random number generator for ownership 'voting'
+  // Mark dofs associated ghost cells as ghost dofs (provisionally)
+  for (CellIterator c(mesh, "all"); !c.end(); ++c)
+  {
+    //if (MPI::rank(MPI_COMM_WORLD) == 1)
+    //    std::cout << "Cell: " << c->index() << ", " << c->midpoint().str(true)
+    //              << ", " << c->is_ghost() << ", " << c->is_shared()  << std::endl;
+    if (c->is_ghost())
+    {
+      const std::vector<la_index>& cell_nodes = node_dofmap[c->index()];
+      for (std::size_t i = 0; i < cell_nodes.size(); ++i)
+        if (shared_nodes[cell_nodes[i]] == -1) // ensure not already set (for R space)
+          shared_nodes[cell_nodes[i]] = -3;
+    }
+    else if (c->is_shared())
+    {
+      const std::vector<la_index>& cell_nodes = node_dofmap[c->index()];
+      for (std::size_t i = 0; i < cell_nodes.size(); ++i)
+        shared_nodes[cell_nodes[i]] = -2;
+    }
+  }
+
+  // Create a random number generator for ownership 'voting' for nodes
+  // that are shared by processes
   std::mt19937 engine(seed);
-  std::uniform_int_distribution<> distribution(0, std::numeric_limits<int>::max());
+  std::uniform_int_distribution<> distribution(0,
+                                               std::numeric_limits<int>::max());
 
   // Mark nodes on inter-process boundary
   std::vector<std::size_t> facet_nodes(ufc_dofmap.num_facet_dofs());
-  for (FacetIterator f(mesh); !f.end(); ++f)
+  for (FacetIterator f(mesh, "all"); !f.end(); ++f)
   {
-    // Skip if facet is internal on this partition (i.e., cannot be on
-    // process boundary)
-    if (f->num_entities(D) == 2)
+    // Skip if facet is not shared
+    // NOTE: second test is for periodic problems
+    if (!f->is_shared() and f->num_entities(D) == 2)
+    //if (!f->is_shared())
+    {
       continue;
+    }
 
     // Get cell to which facet belongs (pick first)
-    const Cell c(mesh, f->entities(mesh.topology().dim())[0]);
+    const Cell cell0(mesh, f->entities(D)[0]);
+
+    // Determine if we have a shared facet that needs to 'export' its
+    // dof indices
+    bool shared_facet = false;
+    if (f->is_ghost())
+    {
+      if (f->num_entities(D) == 2)
+      {
+        const Cell cell1(mesh, f->entities(D)[1]);
+        if (cell0.is_ghost() != cell1.is_ghost())
+          shared_facet = true;
+      }
+    }
+    else
+      shared_facet = true;
 
     // Tabulate dofs (local) on cell
-    const std::vector<la_index>& cell_nodes = node_dofmap[c.index()];
+    const std::vector<la_index>& cell_nodes = node_dofmap[cell0.index()];
 
     // Tabulate which dofs are on the facet
-    ufc_dofmap.tabulate_facet_dofs(facet_nodes.data(), c.index(*f));
+    ufc_dofmap.tabulate_facet_dofs(facet_nodes.data(), cell0.index(*f));
 
     // Mark boundary nodes and insert into map
-    for (std::size_t i = 0; i < facet_nodes.size(); ++i)
+    if (shared_facet)
     {
-      // Get facet node local index and assign votes (positive integer)
-      size_t facet_node_local = cell_nodes[facet_nodes[i]];
-      if (boundary_nodes[facet_node_local] < 0)
-        boundary_nodes[facet_node_local] = distribution(engine);
+      for (std::size_t i = 0; i < facet_nodes.size(); ++i)
+      {
+        // Get facet node local index and assign votes (positive integer)
+        size_t facet_node_local = cell_nodes[facet_nodes[i]];
+        if (shared_nodes[facet_node_local] < 0)
+          shared_nodes[facet_node_local] = distribution(engine);
+      }
     }
+
   }
 }
 //-----------------------------------------------------------------------------
       error("Invalid node ownership index.");
   }
   dolfin_assert((unowned_local_size+owned_local_size) == node_ownership.size());
-  dolfin_assert((unowned_local_size+owned_local_size) == old_local_to_global.size());
+  dolfin_assert((unowned_local_size+owned_local_size)
+                == old_local_to_global.size());
 
   // Create global-to-local index map for local un-owned nodes
   std::vector<std::pair<std::size_t, int>> node_pairs;
   node_pairs.reserve(unowned_local_size);
   for (std::size_t i = 0; i < node_ownership.size(); ++i)
   {
+    /*
+    if (MPI::rank(MPI_COMM_WORLD) == 1)
+      std::cout << "Pairs: " << MPI::rank(MPI_COMM_WORLD)
+                << ", " << old_local_to_global[i]  << ", "
+                <<  node_ownership[i]
+                << std::endl;
+    */
     if (node_ownership[i] == -1)
+    {
+      /*
+      if (MPI::rank(MPI_COMM_WORLD) == 1)
+        std::cout << "Unowned pairs: " << MPI::rank(MPI_COMM_WORLD)
+                  << ", " << old_local_to_global[i]  << ", " <<  i
+                  << std::endl;
+      */
       node_pairs.push_back(std::make_pair(old_local_to_global[i] , i));
+    }
   }
   std::map<std::size_t, int>
     global_to_local_nodes_unowned(node_pairs.begin(), node_pairs.end());
   // Compute offset for owned nodes
   const std::size_t process_offset
     = MPI::global_offset(mpi_comm, owned_local_size, true);
+  //std::cout << "Process offset: " << MPI::rank(mpi_comm) << ", "
+  //          << process_offset << std::endl;
+  //std::cout << "Node ownership size: " << MPI::rank(mpi_comm) << ", "
+  //          << node_ownership.size() << std::endl;
 
   // Allocate space
   old_to_new_local.clear();
     dolfin_assert(old_node_index_local < old_to_new_local.size());
     dolfin_assert(node_remap[counter] < (int) local_to_global.size());
     old_to_new_local[old_node_index_local] = node_remap[counter];
-    local_to_global[node_remap[counter]] = node_remap[counter] + process_offset ;
+    local_to_global[node_remap[counter]]
+      = node_remap[counter] + process_offset ;
 
     // If this node is shared and owned, buffer old and new (global)
     // node index for sending
     if (node_ownership[old_node_index_local] == 0)
     {
-      // Buffer old and new global indices t send
+      // Buffer old and new global indices to send
+      /*
+      if (MPI::rank(mpi_comm) == 0)
+      {
+        std::cout << "buffer: " << old_local_to_global[old_node_index_local]
+                  << ", " << process_offset + node_remap[counter]
+                  << std::endl;
+      }
+      */
       send_buffer.push_back(old_local_to_global[old_node_index_local]);
       send_buffer.push_back(process_offset + node_remap[counter]);
     }
     ++counter;
   }
 
+  /*
+  MPI::barrier(mpi_comm);
+  if (MPI::rank(mpi_comm) == 0)
+  {
+    for (std::size_t i = 0; i < old_to_new_local.size(); ++i)
+    {
+      std::cout << "!!!test0: " << MPI::rank(mpi_comm) << ", " << i << ", "
+                << old_to_new_local[i]
+                << ", " << node_ownership[i] << std::endl;
+    }
+  }
+  MPI::barrier(mpi_comm);
+  */
+
   // FIXME: The below algorithm can be improved (made more scalable)
   //        by distributing (dof, process) pairs to 'owner' range
   //        owner, then letting each process get the sharing process
       const std::size_t received_old_node_index_global = recv_buffer[i];
       const std::size_t received_new_node_index_global = recv_buffer[i + 1];
 
+      /*
+      if (MPI::rank(mpi_comm) == 1)
+      {
+        std::cout << "Received: " << received_old_node_index_global << ", "
+                  << received_new_node_index_global  << std::endl;
+      }
+      */
+
       // Get local node index (if on this process), and check if this
       // process has shared node (and is not the owner)
       auto it
         = global_to_local_nodes_unowned.find(received_old_node_index_global);
       if (it != global_to_local_nodes_unowned.end())
       {
+        /*
+        if (MPI::rank(mpi_comm) == 1)
+          std::cout << "Found index" << std::endl;
+        */
+
         // Get local index for received dof
         const int received_old_node_index_local = it->second;
         dolfin_assert(received_old_node_index_local
         if (test_set.find(received_old_node_index_global) != test_set.end())
          error("Oops, problem in new dofmap code.");
         test_set.insert(received_old_node_index_global);
-        //dolfin_assert(local_to_global_unowned.size() == off_process_node_counter);
+        //dolfin_assert(local_to_global_unowned.size() ==
+        //off_process_node_counter);
 
         // Assign local index to remotely numbered node
         //if (MPI::rank(MPI_COMM_WORLD) ==0)
   }
 
   // Sanity check
+  /*
+  if (MPI::rank(mpi_comm) == 1)
+  {
+    for (std::size_t i = 0; i < old_to_new_local.size(); ++i)
+    {
+      std::cout << "!!!test: " << MPI::rank(mpi_comm) << ", " << i << ", "
+                << old_to_new_local[i]
+                << ", " << node_ownership[i] << std::endl;
+    }
+  }
+  */
   for (auto it : old_to_new_local)
   {
-      dolfin_assert(it != -1);
+    dolfin_assert(it != -1);
   }
 
 }

dolfin/fem/DofMapBuilder.h

     //     process
     //   - node_ownership = 0  -> dof owned by this process and shared
     //     with other processes
-    //   - node_ownership = 1  -> dof >owned by this process and not
+    //   - node_ownership = 1  -> dof owned by this process and not
     //     shared
     //
     // Also computes map from shared node to sharing processes and a
         std::shared_ptr<const SubDomain> constrained_domain,
         const std::size_t block_size);
 
-    static void compute_boundary_nodes(
+
+    // Mark shared nodes. Boundary nodes are assigned a random
+    // positive integer, interior nodes are marked as -1, interior
+    // nodes in ghost layer of other processes are marked -2, and
+    // ghost nodes are marked as -3
+    static void compute_shared_nodes(
       std::vector<int>& boundary_nodes,
       const std::vector<std::vector<la_index>>& node_dofmap,
       const std::size_t num_nodes_local,

dolfin/fem/FiniteElement.h

     std::shared_ptr<const FiniteElement>
       extract_sub_element(const std::vector<std::size_t>& component) const;
 
+  protected:
+
+    // UFC finite element
+    std::shared_ptr<const ufc::finite_element> _ufc_element;
+
   private:
 
     // Recursively extract sub finite element
       extract_sub_element(const FiniteElement& finite_element,
                           const std::vector<std::size_t>& component);
 
-    // UFC finite element
-    std::shared_ptr<const ufc::finite_element> _ufc_element;
-
     // Simple hash of the signature string
     std::size_t _hash;
 

dolfin/fem/SparsityPatternBuilder.cpp

     Progress p("Building sparsity pattern over interior facets", mesh.num_facets());
     for (FacetIterator facet(mesh); !facet.end(); ++facet)
     {
-      bool exterior_facet = false;
+      bool this_exterior_facet = false;
       if (facet->num_global_entities(D) == 1)
-        exterior_facet = true;
+        this_exterior_facet = true;
 
       // Check facet type
-      if (exterior_facets && exterior_facet && !cells)
+      if (exterior_facets && this_exterior_facet && !cells)
       {
         // Get cells incident with facet
         dolfin_assert(facet->num_entities(D) == 1);
         // Insert dofs
         sparsity_pattern.insert_local(dofs);
       }
-      else if (interior_facets && !exterior_facet)
+      else if (interior_facets && !this_exterior_facet)
       {
+        if (facet->num_entities(D) == 1)
+        {
+          dolfin_assert(facet->is_ghost());
+          continue;
+        }
+
         // Get cells incident with facet
+        dolfin_assert(facet->num_entities(D) == 2);
         Cell cell0(mesh, facet->entities(D)[0]);
         Cell cell1(mesh, facet->entities(D)[1]);
 
         // Insert dofs
         sparsity_pattern.insert_local(dofs);
       }
-
       p++;
     }
   }

dolfin/fem/SystemAssembler.cpp

   }
 }
 //-----------------------------------------------------------------------------
-const MeshFunction<std::size_t> * _pick_one_meshfunction(std::string name,
-                                                         const MeshFunction<std::size_t> * a,
-                                                         const MeshFunction<std::size_t> * b)
+const MeshFunction<std::size_t>*
+_pick_one_meshfunction(std::string name,
+                       const MeshFunction<std::size_t> * a,
+                       const MeshFunction<std::size_t> * b)
 {
   if ((a && b) && a != b)
   {
       warning("Bilinear and linear forms do not have same %s subdomains \
-in SystemAssembler. Taking %s subdomains from bilinear form", name.c_str(), name.c_str());
+in SystemAssembler. Taking %s subdomains from bilinear form",
+              name.c_str(), name.c_str());
   }
   return a ? a: b;
 }
   // problems
   if (x0)
   {
-    if (MPI::size(mesh.mpi_comm()) > 1)
-    {
-      warning("Parallel symmetric assembly over interior facets for nonlinear \
-problems is untested");
-    }
     dolfin_assert(x0->size()
                   == _a->function_space(1)->dofmap()->global_dimension());
 
   }
   else
   {
-    // Facet-wise assembly is not working in parallel
-    not_working_in_parallel("System assembly over interior facets");
-
     // Assemble facet-wise (including cell assembly)
     facet_wise_assembly(tensors, ufc, data, boundary_values,
                         cell_domains, exterior_facet_domains,
 
   // Create pointers to hold integral objects
   std::array<const ufc::cell_integral*, 2> cell_integrals
-    = { {ufc[0]->default_cell_integral.get(), ufc[1]->default_cell_integral.get()} };
+    = { {ufc[0]->default_cell_integral.get(),
+         ufc[1]->default_cell_integral.get()} };
 
   std::array<const ufc::exterior_facet_integral*, 2> exterior_facet_integrals
     = { { ufc[0]->default_exterior_facet_integral.get(),
   Progress p("Assembling system (cell-wise)", mesh.num_cells());
   for (CellIterator cell(mesh); !cell.end(); ++cell)
   {
+    // Check that cell is not a ghost
+    dolfin_assert(!cell->is_ghost());
+
     // Get cell vertex coordinates
     cell->get_vertex_coordinates(vertex_coordinates);
 
       if (rank == 2) // form == 0
       {
         dolfin_assert(cell_dofs[form][1]);
-        tensor_required = cell_matrix_required(tensors[form], cell_integrals[form],
+        tensor_required = cell_matrix_required(tensors[form],
+                                               cell_integrals[form],
                                                boundary_values,
                                                *cell_dofs[form][1]);
       }
       else
-      {
         tensor_required = tensors[form] && cell_integrals[form];
-      }
 
       if (tensor_required)
       {
                                                    *cell_dofs[form][1]);
           }
           else
-          {
             tensor_required = tensors[form];
-          }
 
           // Add exterior facet tensor
           if (tensor_required)
 assembler");
   }
 
+  // My MPI rank
+  const int my_mpi_rank = MPI::rank(mesh.mpi_comm());
+
   // Collect pointers to dof maps
   std::array<std::vector<const GenericDofMap*>, 2> dofmaps;
   for (std::size_t i = 0; i < 2; ++i)
   std::vector<std::size_t> num_dofs(2);
 
   // Holders for UFC integrals
-  std::array<const ufc::cell_integral*, 2> cell_integrals =
-    { { ufc[0]->default_cell_integral.get(),
-        ufc[1]->default_cell_integral.get() } };
-  std::array<const ufc::exterior_facet_integral*, 2> exterior_facet_integrals =
-    { { ufc[0]->default_exterior_facet_integral.get(),
-        ufc[1]->default_exterior_facet_integral.get() } };
-  std::array<const ufc::interior_facet_integral*, 2> interior_facet_integrals =
-    { { ufc[0]->default_interior_facet_integral.get(),
-        ufc[1]->default_interior_facet_integral.get() } };
+  std::array<const ufc::cell_integral*, 2> cell_integrals
+    = { { ufc[0]->default_cell_integral.get(),
+          ufc[1]->default_cell_integral.get() } };
+  std::array<const ufc::exterior_facet_integral*, 2> exterior_facet_integrals
+    = { { ufc[0]->default_exterior_facet_integral.get(),
+          ufc[1]->default_exterior_facet_integral.get() } };
+  std::array<const ufc::interior_facet_integral*, 2> interior_facet_integrals
+    = { { ufc[0]->default_interior_facet_integral.get(),
+          ufc[1]->default_interior_facet_integral.get() } };
 
   // Check whether integrals are domain-dependent
   bool use_cell_domains = cell_domains && !cell_domains->empty();
-  bool use_exterior_facet_domains = exterior_facet_domains && !exterior_facet_domains->empty();
-  bool use_interior_facet_domains = interior_facet_domains && !interior_facet_domains->empty();
+  bool use_interior_facet_domains
+    = interior_facet_domains && !interior_facet_domains->empty();
+  bool use_exterior_facet_domains
+    = exterior_facet_domains && !exterior_facet_domains->empty();
+
+  // Indicator whether or not tensor is required
+  std::array<bool, 2> tensor_required_cell, tensor_required_facet;
 
   // Iterate over facets
-  ufc::cell ufc_cell[2];
-  std::vector<double> vertex_coordinates[2];
+  std::array<ufc::cell, 2> ufc_cell;
+  std::array<std::vector<double>, 2> vertex_coordinates;
   Progress p("Assembling system (facet-wise)", mesh.num_facets());
   for (FacetIterator facet(mesh); !facet.end(); ++facet)
   {
     // Number of cells sharing facet
-    const std::size_t num_cells = facet->num_entities(mesh.topology().dim());
+    const std::size_t num_cells = facet->num_entities(D);
+
+    // Check that facet is not a ghost
+    dolfin_assert(!facet->is_ghost());
 
     // Interior facet
     if (num_cells == 2)
       // Get cells incident with facet and associated data
       for (std::size_t c = 0; c < 2; ++c)
       {
-        cell[c] = Cell(mesh, facet->entities(mesh.topology().dim())[c]);
+        cell[c] = Cell(mesh, facet->entities(D)[c]);
         cell_index[c] = cell[c].index();
         local_facet[c] = cell[c].index(*facet);
         cell[c].get_vertex_coordinates(vertex_coordinates[c]);
         cell[c].get_cell_data(ufc_cell[c], local_facet[c]);
       }
 
-      // Loop over lhs and then rhs facet contributions
+      const bool process_facet = (cell[0].is_ghost() != cell[1].is_ghost());
+      bool facet_owner = true;
+      if (process_facet)
+      {
+        int ghost_rank = -1;
+        if (cell[0].is_ghost())
+          ghost_rank = cell[0].owner();
+        else
+          ghost_rank = cell[1].owner();
+        dolfin_assert(my_mpi_rank != ghost_rank);
+        dolfin_assert(ghost_rank != -1);
+        if (ghost_rank < my_mpi_rank)
+          facet_owner = false;
+      }
+
+      // Loop over lhs and then rhs contributions
       for (std::size_t form = 0; form < 2; ++form)
       {
         // Get rank (lhs=2, rhs=1)
         const std::size_t rank = (form == 0) ? 2 : 1;
 
-        // Get integral for sub domain (if any)
-        if (use_interior_facet_domains)
-        {
-          const std::size_t domain = (*interior_facet_domains)[*facet];
-          interior_facet_integrals[form]
-            = ufc[form]->get_interior_facet_integral(domain);
-        }
-
-        // Reset some temp data
-        std::fill(ufc[form]->macro_A.begin(), ufc[form]->macro_A.end(), 0.0);
-
-        // Update UFC object
-        // TODO: Remove this? It seems to be unused, update is called
-        //       before each tabulate_tensor below...
-        //ufc[form]->update(cell[0], vertex_coordinates[0], ufc_cell[0],
-        //                 cell[1], vertex_coordinates[1], ufc_cell[1],
-        //                 interior_facet_integrals[form]->enabled_coefficients());
-
         // Compute number of dofs in macro dofmap
         std::fill(num_dofs.begin(), num_dofs.begin() + rank, 0);
         for (std::size_t c = 0; c < num_cells; ++c)
               = &(dofmaps[form][dim]->cell_dofs(cell_index[c]));
             num_dofs[dim] += cell_dofs[form][c][dim]->size();
           }
+
+          // Resize macro dof holder
+          for (std::size_t dim = 0; dim < rank; ++dim)
+            macro_dofs[form][dim].resize(num_dofs[dim]);
+
+          // Tabulate dofs on macro element
+          const std::size_t rank = (form == 0) ? 2 : 1;
+          for (std::size_t dim = 0; dim < rank; ++dim)
+          {
+            std::copy(cell_dofs[form][c][dim]->begin(),
+                      cell_dofs[form][c][dim]->end(),
+                      macro_dofs[form][dim].begin()
+                      + c*cell_dofs[form][0][dim]->size());
+          }
         }
 
-        // Resize macro dof vector
-        for (std::size_t dim = 0; dim < rank; ++dim)
-          macro_dofs[form][dim].resize(num_dofs[dim]);
+
+        // Get facet integral for sub domain (if any)
+        if (use_interior_facet_domains)
+        {
+          const std::size_t domain = (*interior_facet_domains)[*facet];
+          interior_facet_integrals[form]
+            = ufc[form]->get_interior_facet_integral(domain);
+        }
 
         // Check if facet tensor is required
-        bool facet_tensor_required;
-        if (rank == 2) // form == 0
+        if (form == 0)
         {
-          for (std::size_t c =0; c < 2; ++c)
+          for (std::size_t c = 0; c < 2; ++c)
           {
             dolfin_assert(cell_dofs[form][c][1]);
-            facet_tensor_required = cell_matrix_required(tensors[form],
-                                                        interior_facet_integrals[form],
-                                                        boundary_values,
-                                                        *cell_dofs[form][c][1]);
-            if (facet_tensor_required)
+            tensor_required_facet[form]
+              = cell_matrix_required(tensors[form],
+                                     interior_facet_integrals[form],
+                                     boundary_values,
+                                     *cell_dofs[form][c][1]);
+            if (tensor_required_facet[form])
               break;
           }
         }
         else
         {
-          facet_tensor_required = tensors[form]
-            && interior_facet_integrals[form];
-        }
-
-        // Compute facet contribution to tensor, if required
-        if (facet_tensor_required)
-        {
-          // Update to current pair of cells
-          ufc[form]->update(cell[0], vertex_coordinates[0], ufc_cell[0],
-                            cell[1], vertex_coordinates[1], ufc_cell[1],
-                            interior_facet_integrals[form]->enabled_coefficients());
-          // Integrate over facet
-          interior_facet_integrals[form]->tabulate_tensor(ufc[form]->macro_A.data(),
-                                                   ufc[form]->macro_w(),
-                                                   vertex_coordinates[0].data(),
-                                                   vertex_coordinates[1].data(),
-                                                   local_facet[0],
-                                                   local_facet[1],
-                                                   ufc_cell[0].orientation,
-                                                   ufc_cell[1].orientation);
+          tensor_required_facet[form]
+            = (tensors[form] && interior_facet_integrals[form]);
         }
 
-        // If we have local facet 0 for cell[i], compute cell
-        // contribution
-        for (std::size_t c = 0; c < num_cells; ++c)
+        // Get cell integrals
+        for (std::size_t c = 0; c < 2; ++c)
         {
           if (local_facet[c] == 0)
           {
             }
 
             // Check if facet tensor is required
-            bool cell_tensor_required;
-            if (rank == 2) // form == 0
+            if (form == 0)
             {
               dolfin_assert(cell_dofs[form][c][1]);
-              cell_tensor_required
+              tensor_required_cell[form]
                 = cell_matrix_required(tensors[form],
                                        cell_integrals[form],
                                        boundary_values,
                                        *cell_dofs[form][c][1]);
             }
             else
-            {
-              cell_tensor_required = tensors[form] && cell_integrals[form];
-            }
-
-            // Compute cell tensor, if required
-            if (cell_tensor_required)
-            {
-              ufc[form]->update(cell[c], vertex_coordinates[c], ufc_cell[c],
-                                cell_integrals[form]->enabled_coefficients());
-              cell_integrals[form]->tabulate_tensor(ufc[form]->A.data(),
-                                                    ufc[form]->w(),
-                                                    vertex_coordinates[c].data(),
-                                                    ufc_cell[c].orientation);
-
-              // FIXME: Can the below two block be consolidated?
-              const std::size_t nn = cell_dofs[form][c][0]->size();
-              if (form == 0)
-              {
-                const std::size_t mm = cell_dofs[form][c][1]->size();
-                for (std::size_t i = 0; i < mm; i++)
-                {
-                  for (std::size_t j = 0; j < nn; j++)
-                  {
-                    ufc[form]->macro_A[2*nn*mm*c + num_cells*i*nn + nn*c + j]
-                      += ufc[form]->A[i*nn + j];
-                  }
-                }
-              }
-              else
-              {
-                for (std::size_t i = 0; i < cell_dofs[form][c][0]->size(); i++)
-                  ufc[form]->macro_A[nn*c + i] += ufc[form]->A[i];
-              }
-            }
+              tensor_required_cell[form] = tensors[form] && cell_integrals[form];
           }
+        }
 
-          // Tabulate dofs on macro element
-          for (std::size_t dim = 0; dim < rank; ++dim)
-          {
-            std::copy(cell_dofs[form][c][dim]->begin(),
-                      cell_dofs[form][c][dim]->end(),
-                      macro_dofs[form][dim].begin()
-                      + c*cell_dofs[form][0][dim]->size());
-          }
-        } // End loop over cells sharing facet (c)
-      } // End loop over form (form)
+        // Reset work array
+        std::fill(ufc[form]->macro_A.begin(), ufc[form]->macro_A.end(), 0.0);
+      }
 
-      // Modify local tensor for bcs
+      // Compute cell/facet tensor for lhs and rhs
+      std::array<std::size_t, 2> matrix_size;
+      std::size_t vector_size = 0;
+      std::size_t cell_index = 0;
+      for (std::size_t c = 0; c < num_cells; ++c)
+      {
+        if (local_facet[c] == 0)
+        {
+          matrix_size[0] = cell_dofs[0][c][0]->size();
+          matrix_size[1] = cell_dofs[0][c][1]->size();
+          vector_size = cell_dofs[1][c][0]->size();
+          cell_index = c;
+        }
+      }
+      compute_interior_facet_tensor(ufc, ufc_cell,
+                                    vertex_coordinates,
+                                    tensor_required_cell,
+                                    tensor_required_facet,
+                                    cell, local_facet,
+                                    facet_owner,
+                                    cell_integrals,
+                                    interior_facet_integrals,
+                                    matrix_size,
+                                    vector_size);
+
+      // Modify local tensors for bcs
       apply_bc(ufc[0]->macro_A.data(), ufc[1]->macro_A.data(), boundary_values,
                macro_dofs[0][0], macro_dofs[0][1]);
 
       // Add entries to global tensor
-      for (std::size_t form = 0; form < 2; ++form)
+      if (tensors[1])
+        tensors[1]->add_local(ufc[1]->macro_A.data(), macro_dofs[1]);
+
+      const bool add_macro_element
+        = ufc[0]->form.has_interior_facet_integrals();
+      if (tensors[0] && add_macro_element)
+          tensors[0]->add_local(ufc[0]->macro_A.data(), macro_dofs[0]);
+      else if (tensors[0] && !add_macro_element)
       {
-        bool add_macro_element = true;
-        if (form==0) // bilinear form
-          add_macro_element = ufc[0]->form.has_interior_facet_integrals();
-
-        if (tensors[form] && add_macro_element)
-        {
-          tensors[form]->add_local(ufc[form]->macro_A.data(), macro_dofs[form]);
-        }
-        else if (tensors[form] && !add_macro_element) // only true for the bilinear form
-        {
-          // the sparsity pattern may not support the macro element
-          // so instead extract back out the diagonal cell blocks and add them individually
-          for (std::size_t c = 0; c < num_cells; ++c)
-          {
-            if (local_facet[c] == 0)
-            {
-              // Get cell integrals for sub domain (if any)
-              if (use_cell_domains)
-              {
-                const std::size_t domain = (*cell_domains)[cell[c]];
-                cell_integrals[form] = ufc[form]->get_cell_integral(domain);
-              }
-
-              // Check if cell tensor was assembled - not necessary but saves inserting 0s
-              bool cell_tensor_required
-                = cell_matrix_required(tensors[form],
-                                       cell_integrals[form],
-                                       boundary_values,
-                                       *cell_dofs[form][c][1]);
-
-              // Add cell tensor, if required
-              if (cell_tensor_required)
-              {
-                data.zero_cell();
-                const std::size_t nn = cell_dofs[form][c][0]->size();
-                const std::size_t mm = cell_dofs[form][c][1]->size();
-                for (std::size_t i = 0; i < mm; i++)
-                {
-                  for (std::size_t j = 0; j < nn; j++)
-                  {
-                    data.Ae[form][i*nn + j]
-                       = ufc[form]->macro_A[2*nn*mm*c + num_cells*i*nn + nn*c +j];
-                  }
-                }
-                tensors[form]->add_local(data.Ae[form].data(), cell_dofs[form][c]);
-              }
-            }
-          } // End loop over cells sharing facet (c)
-        }
-      } // End loop over form (form)
+        // The sparsity pattern may not support the macro element so
+        // instead extract back out the diagonal cell blocks and add
+        // them individually
+        matrix_block_add(*tensors[0], data.Ae[0], ufc[0]->macro_A,
+                         tensor_required_cell[0], local_facet,
+                         cell_dofs[0][cell_index]);
+      }
     }
     else // Exterior facet
     {
-      // Get mesh cell to which mesh facet belongs (pick first, there
-      // is only one)
+      // Get mesh cell to which mesh facet belongs (pick first,
+      // there is only one)
       Cell cell(mesh, facet->entities(mesh.topology().dim())[0]);
 
-      // Get local index of facet with respect to the cell
-      const std::size_t local_facet = cell.index(*facet);
-
-      // Get cell data
-      cell.get_vertex_coordinates(vertex_coordinates[0]);
-      cell.get_cell_data(ufc_cell[0], local_facet);
-
-      // Initialize macro element matrix/vector to zero
-      data.zero_cell();
-
-      // Loop over lhs and then rhs facet contributions
+      // Decide if tensor needs to be computed
       for (std::size_t form = 0; form < 2; ++form)
       {
         // Get rank (lhs=2, rhs=1)
         const std::size_t rank = (form == 0) ? 2 : 1;
 
-        // Get local-to-global dof maps for cell
-        for (std::size_t dim = 0; dim < rank; ++dim)
+        // Get cell integrals for sub domain (if any)
+        if (use_cell_domains)
         {
-          cell_dofs[form][0][dim]
-            = &(dofmaps[form][dim]->cell_dofs(cell.index()));
+          const std::size_t domain = (*cell_domains)[cell];
+          cell_integrals[form] = ufc[form]->get_cell_integral(domain);
         }
 
-        // Reset some temp data
-        std::fill(ufc[form]->A.begin(), ufc[form]->A.end(), 0.0);
-
         // Get exterior facet integrals for sub domain (if any)
         if (use_exterior_facet_domains)
         {
             = ufc[form]->get_exterior_facet_integral(domain);
         }
 
-        // Check if facet tensor is required
-        bool facet_tensor_required;
-        if (rank == 2) // form == 0
+        // Get local-to-global dof maps for cell
+        for (std::size_t dim = 0; dim < rank; ++dim)
+        {
+          cell_dofs[form][0][dim]
+            = &(dofmaps[form][dim]->cell_dofs(cell.index()));
+        }
+
+        // Store if tensor is required
+        if (rank == 2)
         {
           dolfin_assert(cell_dofs[form][0][1]);
-          facet_tensor_required
+          tensor_required_facet[form]
             = cell_matrix_required(tensors[form],
                                    exterior_facet_integrals[form],
                                    boundary_values,
                                    *cell_dofs[form][0][1]);
+          tensor_required_cell[form]
+            = cell_matrix_required(tensors[form],
+                                   cell_integrals[form],
+                                   boundary_values,
+                                   *cell_dofs[form][0][1]);
         }
         else
         {
-          facet_tensor_required = (tensors[form] && exterior_facet_integrals[form]);
-        }
-
-        // Compute facet integral,if required
-        if (facet_tensor_required)
-        {
-          // Update UFC object
-          ufc[form]->update(cell, vertex_coordinates[0], ufc_cell[0],
-                            exterior_facet_integrals[form]->enabled_coefficients());
-          exterior_facet_integrals[form]->tabulate_tensor(ufc[form]->A.data(),
-                                                          ufc[form]->w(),
-                                                          vertex_coordinates[0].data(),
-                                                          local_facet,
-                                                          ufc_cell[0].orientation);
-          for (std::size_t i = 0; i < data.Ae[form].size(); i++)
-            data.Ae[form][i] += ufc[form]->A[i];
+          tensor_required_facet[form]
+            = (tensors[form] && exterior_facet_integrals[form]);
+          tensor_required_cell[form]
+            = tensors[form] && cell_integrals[form];
         }
+      }
 
-        // If we have local facet 0, assemble cell integral
-        if (local_facet == 0)
-        {
-          // Get cell integrals for sub domain (if any)
-          if (use_cell_domains)
-          {
-            const std::size_t domain = (*cell_domains)[cell];
-            cell_integrals[form] = ufc[form]->get_cell_integral(domain);
-          }
-
-          // Check if facet tensor is required
-          bool cell_tensor_required;
-          if (rank == 2)
-          {
-            dolfin_assert(cell_dofs[form][0][1]);
-            cell_tensor_required = cell_matrix_required(tensors[form],
-                                                        cell_integrals[form],
-                                                        boundary_values,
-                                                        *cell_dofs[form][0][1]);
-          }
-          else
-          {
-            cell_tensor_required = tensors[form] && cell_integrals[form];
-          }
-
-          // Compute cell integral, if required
-          if (cell_tensor_required)
-          {
-            ufc[form]->update(cell, vertex_coordinates[0], ufc_cell[0],
-                             cell_integrals[form]->enabled_coefficients());
-            cell_integrals[form]->tabulate_tensor(ufc[form]->A.data(),
-                                                  ufc[form]->w(),
-                                                  vertex_coordinates[0].data(),
-                                                  ufc_cell[0].orientation);
-            for (std::size_t i = 0; i < data.Ae[form].size(); i++)
-              data.Ae[form][i] += ufc[form]->A[i];
-          }
-        }
-      } // End loop over forms [form]
+      // Compute cell/facet tensors
+      compute_exterior_facet_tensor(data.Ae, ufc, ufc_cell[0],
+                                    vertex_coordinates[0],
+                                    tensor_required_cell,
+                                    tensor_required_facet,
+                                    cell, *facet,
+                                    cell_integrals,
+                                    exterior_facet_integrals);
 
       // Modify local matrix/element for Dirichlet boundary conditions
       apply_bc(data.Ae[0].data(), data.Ae[1].data(), boundary_values,
   }
 }
 //-----------------------------------------------------------------------------
-inline void SystemAssembler::apply_bc(double* A, double* b,
-                                      const DirichletBC::Map& boundary_values,
-                                      const std::vector<dolfin::la_index>& global_dofs0,
-                                      const std::vector<dolfin::la_index>& global_dofs1)
+void SystemAssembler:: compute_exterior_facet_tensor(
+  std::array<std::vector<double>, 2>& Ae,
+  std::array<UFC*, 2>& ufc,
+  ufc::cell& ufc_cell,
+  std::vector<double>& vertex_coordinates,
+  const std::array<bool, 2>& tensor_required_cell,
+  const std::array<bool, 2>& tensor_required_facet,
+  const Cell& cell,
+  const Facet& facet,
+  const std::array<const ufc::cell_integral*, 2>& cell_integrals,
+  const std::array<const ufc::exterior_facet_integral*, 2>& exterior_facet_integrals)
+{
+  // Get local index of facet with respect to the cell
+  const std::size_t local_facet = cell.index(facet);
+
+  // Get cell data
+  cell.get_vertex_coordinates(vertex_coordinates);
+  cell.get_cell_data(ufc_cell, local_facet);
+
+  // Loop over lhs and then rhs facet contributions
+  for (std::size_t form = 0; form < 2; ++form)
+  {
+    // Initialize macro element matrix/vector to zero
+    std::fill(Ae[form].begin(), Ae[form].end(), 0.0);
+    std::fill(ufc[form]->A.begin(), ufc[form]->A.end(), 0.0);
+
+    // Compute facet integral,if required
+    if (tensor_required_facet[form])