You need a recent version of Python. This scripts have been tested with Python 2.4 and 2.5.
You will need also the Python OGR bindings and the RPy module.
On Debian-like systems (including Ubuntu), this is just a matter of installing the following packages: python-gdal, python-rpy.
To run the script, open a terminal in the directory that contains the files, and type:
python example.py 3.1
3.1 is the magic number that we pass to the script. If you don't specify a magic number, the default value is 3.0.
The usual output is something like this:
by hand Min. 1st Qu. 3rd Qu. Median Max. Mean 9.90 12.00 16.50 13.50 30.20 15.01 Simple Height Min. 1st Qu. 3rd Qu. Median Max. Mean 9.093 12.780 20.230 15.170 48.190 16.860 Smart Height Min. 1st Qu. 3rd Qu. Median Max. Mean 7.768 11.540 18.410 13.640 34.670 15.150
The image plotR.png is created in the current directory and shows a graphical comparison between the various recording methods.
This Python library was written as an help for the study of stone walls, mainly through the quantitative analysis of spatial dimensions of stones.
The code is still in the early stages of development and lacks a clear structure, but the functions are documented quite well with docstrings. At the moment you find an example.py that shows how to use the library routines, an example dataset (made by various files), and 3 Python scripts:
- geometrywall.py has all the geometry (OGR) related functions
- rplots.py contains the RPlots class that can be used to output summaries and graphs
- breaks.py has the code for automatically classifying stones into rows
The numerical analisys is done with R using the rpy Python module. We are trying to use just methods and functions from the standard R library.
- the ogr module is used to import geometry data and get all the needed parameters like centroid and boundary coordinates
- height is calculated with two different methods:
- a simple method max(y) - min(y)
- the smart method
- we also store a central measure (like median) to be used in the next steps as a parameter to find stone rows
Each stone is assigned a rank number using a kernel density function (density in the R stats standard library) with a narrow bandwidth. This works because vertical coordinates of centroids aren't distributed uniformely, but are concentrated around the mean row's value. Thus, stones who are on the same course will get the same rank number. This allows other kind of analyses based on the variance of single courses and other methods still to be explored.
To best compare the measures taken by hand and the automatic ones we need a complete and detailed case study, that has to be well drawn, with an attribute table containing the hand-taken measures.
the smart method
The basic algorhythm works by finding the highest and the lowest Y coordinate of the stone polygon, then a simple max(y) - min(y) subtraction gives us a rough estimate of the true height of the stone. Tests carried between hand-recorded measures and this method show this roughness is way too high for our needs. The hand-taken measures are our reference because the expert human operator is able to record a significant value for the stone height, and thus (s)he behaves quite differently than this simple algorhythm.
--insert images and graphs here--
We can try to get a smart height by averaging the n highest and n lowest Y coordinates. This way our max(y) - min(y) becomes something slightly different: a difference between the average upper limit and the average lower limit. We use a magic number to express the ratio of total points to be used in this calculation.
If magicNumber = 3.0 we are going to use totalPoints / magicNumber points for the upper limit and totalPoints / magicNumber points for the lower limit.
magicNumber = 3.0 self.stonePointsCount = stoneBoundary.GetPointCount() pointsRatio = int(( self.stonePointsCount / magicNumber )) + 1
The stoneBoundary object is the OGR boundary (of type LINESTRING) of the stone. The OGR GetPointCount() method simply returns the number of points in a LINESTRING.
The pointsRatio variable is used to calculate for each stone how many points are needed to compute the smart height. This way we make sure that the algo is consistent across all the stones. Using a fixed value doesn't make sense here, because there will be stones with a few points and others with more than 20 points. The numerical value self.stonePointsCount / magicNumber is a floating point number, that we must convert to an integer in order to use it.
We should think about the different results of adding or not 1 to this variable.
def smartAlgo(listey,ratio): '''`smart' algo with average coordinates.''' listey.sort() asd = 0 for i in listey[0:ratio]: asd = asd + i yMin = asd/ratio asd = 0 for i in listey[-ratio:]: asd = asd + i yMax = asd/ratio yAvg = yMax - yMin return yAvg
the smart 2 method
This second smart algorhythm works in a slightly different way from the first one.
Instead of using a predetermined number of points for averaging the upper and lower limits, we use a range of Y coordinates based on the extreme values.
So, if p_max is the point with the highest Y coordinate, and p_min the one with the lowest, we first obtain the simple height with:
simple_height = y(p_max) - y(p_min)
Then, the points that will be used for the average upper limit are those whose Y coordinate is such that:
(y(p_max) - ( simple_height / n )) < y(p) <= y(p_max)
where n expresses the range that should be used, proportional to the stone height. Note that y(p) is less or equal than y(p_max), otherwise p_max itself would be excluded from the procedure, exposing to ZeroDivisionError s and other bugs. The same applies for the lower limit. Once the points to use have been selected, the two averages are calculated and their difference is the resulting smart_2_height of the stone.
An example should make it more clear. If y(p_max) for our stone is 410.34 and y(p_min) = 395.16, it's easy to obtain:
simple_height = y(p_max) - y(p_min) = 410.34 - 395.16 = 15.18
Then, for finding the average upper limit, we iterate through all the points in the current stone, and select only those such as that:
(y(p_max) - ( simple_height / n )) < y(p) <= y(p_max) (410.34 - (15.18 / n)) < y(p) < 410.34
Which value should we give to n? Some experiments showed that values around 7 give good results, so let's use this value for now:
(410.34 - (15.18 / 7)) < y(p) <= 410.34 408.17 < y(p) <= 410.34
So, only those points that are in that range will be included in the average upper limit.
So far, the two algorhythms both work quite well when compared to hand-made measurements, while the difference between the two are poorly significant.
Given this correspondance, we should choose the faster and simpler one. Another important issue is the choice of parameters. At the moment parameters must be specified manually by the user (or fixed in the source code), and there are no plans to change this.
This analysis is optionally based on the height values calculated with one of the methods above.
In 1993, Parenti and Doglioni suggested the use, among other qualitative parameters, of a quantitative parameter which would be useful to describe a stone wall and eventually compare two walls.
This quantitative parameter is calculated on a random area from the wall, as the ratio between the area occupied by stones and the "empty" areas around the stones. This value, if compared to other numeric parameters (most notably the number of stones that fall into the same area), can be useful when creating a tipology.
Our analysis is based on this method, pushing the same concept beyond the barrier of the wall taken as a whole.
For each stone-polygon, we start creating a buffer area. This is the area that contains all the points within a certain distance from the polygon boundary. How the distance is chosen can slightly change the results, depending on wheter a fixed value is used or a value proportional to some other value of each polygon (area, perimeter, height, width, etc..).
After the buffer area has been created, the procedure is as follows:
- subtract the stone area from the buffer area (which includes it) to get only the actual buffer.
- find the intersection of the buffer area with the other surrounding stones (this is obtained by creating a multipolygon that includes all stones) and retrieve the total area of this intersection
- the intersection_area / buffer_area ratio is the value we will use as an indicator of (possibly hidden) groups of stones.
So far, this method has failed to give the expected results. Lower values are obtained for stones that are near the wall limits, or have otherwise no stones on one or more sides. For normal stones, there are no significant variations in the obtained value.
Though not related with the original idea, this method could be used to find which stones are suitable for further analyses that are based on the wall fabric.