Wiki

Clone wiki

fao_openjump / Home

2013-09-30

Example ToolTip.

Tooltip.

eLocust3 example tooltip

2013-10-29

Development of an eLocust tree based GUI and correction of language duplicates.

Tasks completed to date:

  • Design/implement a way to automatically read in language duplicates errors within the merge tool.
  • Implement a language duplicate error format within xls.
  • First pass of language duplicate identification this needs to be triple checked.
  • Development of eLocust3 tree structure data expand collapse functions.
  • Development of tree icon behaviour.
  • Development of error removal from tree structure.
  • Packaging and documentation of all MAC OSX dependencies with the repository.
  • Documentation and test build to upgrade the modifed OJ core.

Current tasks for Nov 2013:

  • Test the language duplicate correction algorithm in all databases.
  • Completion of eLocust3 data import module user interface - sent to testers for feedback.
  • Design NOVACOM --> eLocust3 database structure. dependent on NovaCom
  • Finalize eLocust3 ingest file for the RAMSESv4 database dependednt on NovaCom.
  • Completion of survey tree functionality.
  • Completion of error correction functionality.

2013-09-30

Development of an eLocust import data model and tree based GUI.

This is the eLocust3 class diagram.

eLocust3 import model

This is an example of the user import function showing data errors.

Example UI showing errors within NovaCom file

Tasks completed to date:

  • Provide merge output so that the manual checking for language duplicates can be started.
  • Design a way to automatically correct language duplicates with the results of step <above>. This is still waiting on other team members.
  • Prototype development of eLocust3 data import/export module to work with the next generation of NovaCom satellite.
  • Cross checking the NOVACOM eLocust3 file against the RAMSESv4 structure.
  • Arrive at eLocust3 ingest file for the RAMSESv4 database. We are still waiting for confirmation from NovaCom.

Current tasks for October 2013:

  • Design implement a way to automatically correct language duplicates in all databases.
  • Completion of eLocust3 data import module user interface - sent to testers for feedback.
  • Design NOVACOM --> eLocust3 database structure.
  • Finalize eLocust3 ingest file for the RAMSESv4 database.

2013-09-01

Development of a global database merge tool.

The tool is used to identify duplicate tables, columns and incorrect datatypes within the 2000 different RAMSESV3 columns. It merges the data into a unified global data structure.

The tool used to merge all country databases

Tasks completed to date:

1. Modified the database merge tool
  • Set up new code repository (fork) to modify merge tool.
  • Modified merge tool to ingest external input e.g. a list of manual changes to the database structure.
  • Completed the automated identification of relational links using MS Interop assemblies (Acess/Excel 2003).
  • Mapped relational links during the conversion of MS Access files to PostgreSQL.
  • Added identified relational links into the new table create statements.
  • Modified merge tool to treat all Larve tables as Locust tables when conducting cross table analysis.
  • Developed application configuration settings so that database parameters are read from an app.config file as opposed to hard coded.
  • Tested the updated merge process using existing MS Access data.
  • Updated the merge tool documentation, to reflect the modified merge process.
  • Developed a way to automatically generate a spreadsheet of the new structure using - MS Interop assemblies (Access/Excel 2003).
2. Implemented database changes
  • Completed a new unified merge (all countries) in order to create the relational joins within the database.
  • Completed the merge process between Locust and Larvae tables using the updated merge tool.
  • Produced a new output schema in SchemaSpy.
  • Produced a new output schema as a spreadsheet.
3. Misc
  • Attended meeting in Rome to discuss database structure.
  • Organised/attended a series of weekly meetings with Hichem Dridi (CLCPRO) / Ahmed Salem (MAU CNLA) to discuss database merge.
  • Research into optimal eLocust3 ingest

Current tasks for Sept 2013:

  • Provide merge output so that the manual checking for language duplicates can be started.
  • Design a way to automatically correct language duplicates with the results of step <above>.
  • Prototype development of eLocust3 data import/export module to work with the next generation of NovaCom satellite.
  • Cross checking the NOVACOM eLocust3 file against the RAMSESv4 structure.
  • Arrive at eLocust3 ingest file for the RAMSESv4 database

2013-08-01

Project Meeting

Objective:

A meeting was held in Rome to discuss the RAMSESv4 database structure.

Background:

RAMSESv3 technical problems:

  • A legacy system based on an obsolete commercial technology stack.
  • It will no longer continue to work once the Inmarsat satellite is decommissioned (fall 2014).
  • FAO cannot upgrade a legacy system, so are forced to implement a new regional information system.
  • To avoid future legacy issues, FAO decided to implement an Open Source technology stack.
  • The key to this is the PostGIS spatial database system.
  • Storing data in an 2open spatial database system significantly 3lowers software development costs.
  • The new technology stack has an approximate commercial value of $500,000.00 per year.

General Technical Issues Identified:

  • RAMSESv3 had no centralised design strategy - this created a series of costly deployment/maintenance problems.
  • RAMSESv4 seeks to bring v3 together to significantly lower these costs.
Requirements needed in order to continue

The current information system consists of over two dozen seperarate country systems - we first need to unify these into a single, maintainable and deployable global system.

In order to bring everything together - many of the technical/design errors introduced in RAMSESV3 need to be corrected.

The biggest technical issue is that of the underlying database structure, or lack of it.This means that the current system(s) cannot be upgraded or modernised any further.

consolidation into a single system will provide
  • Interoperability with other global software systems that adhere to OCG standards.
  • Internal database analysis, rather than dependencies on expensive third party software e.g. commercial off the shelf (COTS) GIS products.

RAMSES meeting 17 -18 June 2013

Specific Database Issues Identified - (see issue tracker for further details:)

  • Link Locust and Control table(s) (item #147).
  • Removal of language duplicates - duplicate columns need to be removed (item #71).
  • Larve table needs to be merged with the unified locust table (item #145).
  • A solution needs to be developed with how to update each country database in situ without re-installing (item #104).
  • Fledgling maturity problem (item #124).
  • eLocust data import plugin and database structure (#143)
  • Relational joins need to be implemented (item #134).
  • Date issues need to be resolved (item #125).
  • Locust table is incorrectly modelled and too large (#27). It is likely that this will be too expensive to fix.
Post meeting observations:
  • DLIS is currently 30% of the way through a major (global) technology transition.
  • The transition is being forced by redundancy of the old technology stack.
  • The old technology stack had inherent design errors - these were largely database related.
  • A combination of legacy technology and design errors has led to a three fold increase in technological requirement.
  • Resource allocation has been cut by 70% and underestimated by 93%.
  • It is likely that budget holders do not understand the technical or resource requirements needed to deliver the project.
  • It is a difficult situation for stakeholders because end of life has coincided with budget cuts.
  • Current technology cannot be updated and resources needed to implement a replacement are highly limited.

Some possible solutions:

  • Tasks need to be prioritised according to resources available.
  • Realistic deadlines need to be set.
  • Time appreciation needs to be viewed with reference to impact on capability.
  • Budget holders are non-technical and need to be advised why costly technological problems occur, so as to avoid them in the future.
  • Resources and technical skills requirement need to be revised.

2013-06-11

Begin Contract - project resumed.

2013-04-10

Mandatory Contract Break - project stopped.

2013-04-30

CRC/SWAC DLIO workshop 2013

CRC/SWAC DLIO workshop 2013

Desert Locust Information Officers from ten countries. Egypt, Eritrea, Ethiopia, India, Iran, Oman, Pakistan, Saudi Arabia, Sudan, Yemen) and the Desert Locust Control Organization for Eastern Africa

RAMSESv4 GIS system was successfully installed, with the first training being given to attendees of the CRC/SWAC DLIO workshop.

Thank you for the efforts of all participants involved to help make this a technical success.

2013-04-12

Raster Download Module

  • Image below shows the new raster download functionality in development - Blue Marble has been downloaded from NASA and is displayed in RAMSESv4

Blue Marble Test

2013-04-10

Completed issues

  • Background ECW data has been sucessfully converted to GeoTiff using GDAL.
  • The main application has been modified to account for the removal of ECW data.
  • Landsat ECW data will be converted to TIFF and served via GeoServer because the un-compression strain on indvidual PC's is too much.
  • Zoom control bug has been rectified - the map will automatically zoom to the extent of the locust selection.
  • DLC has been added as a country group (North Sudan, Eritrea, Ethiopia, Djibouti, Somalia)

DLC as country group

  • Djibouti and Eritrea have RAMSESv4 systems available.
  • Missing raster data since 2008 has been sucessfully downloaded to HQ ahead of the Cairo workshop.
  • Acceptance testing is currently underway ahead of the Cairo workshop.
  • Preliminary testing & training is expected to start April 2013.

2013-03-09

JAVA installation successfully tested in Oman

The first (Libyan) release of the installer was designed solely for Windows XP 32 bit. It needed further modification to include (Win7 32/64 bit) - for this reason the new installer is now 60% larger.

RAMSESv4 install process consists of 3 parts.

1. Java package installation (Java Run Time + Java Advanced Imagery).

2. Database installation (PostgreSQL + PostGIS + vector data import).

3. RAMSESv4 application installation.

The installer allows deployment to a non-technical user - whilst almost all issues can be overcome by users with technical skills, a non technical user will fail. This means that the installer needs to automatically handle as many issues, as the different versions of Windows can throw at it. We have identified these issues by testing on as many machines as possible.

Results:

Sections (2) & (3) are suitable for deployment after quite rigorous testing - non of the identified issues have re-appeared. (1) Java is designed to be installed with some level of technical experience - therefore a lot of work over the last 14 days has gone into handling the Java installation - remember this sits on top of the operating system. Test results show that 95% of all issues can be automatically handled - however I have just located another minor issue with respect to the setting of the JAVA_HOME variable, simple issue for technical person, not possible for a non technical user. With this issue fixed I will release to test team shortly.

The initial RAMSEv4 install is a one off requirement (hence workload). The installer makes this simple, one double click to install/uninstall. Simple user experience is inversely correlated to back end complexity.

A full production data load test will follow before patching the original installer.

NEW installer features:

  • install.log records installation
  • System check - min RAM 2GB is met
  • System check - no previous Java installed
  • System check - windows version and architecture logged
  • system check - no previous (Java Advanced Imaging) JAI installed
  • System error and bail out messages on system checks
  • Checks installation of software using native Windows Management Instrumentation Command-line (WMIC)
  • Added error handling for incorrect Java installation
  • Automated JAI install
  • Added error handling for incorrect JAI installation
  • Automated uninstall of JAI with uninstaller
  • Translation of installer into French/English (handling of french accent components).

NEW RAMSESv4 features:

  • Development of complete logging framework using Apache Commons log4j

Detailled log files can be produced and send back to the technical support team for analysis and bug fixing. This is avilable to developers of ALL plugins because it has been integrated within the RAMSEv4 framework. Levels are easily set by the user - below.

Detailled logging for technical support

2013-03-09

WIKI update back in progress.

(Quick Synopsis of Progress).

  • Nov 2012 Sudan test proceeded - all OK.
  • Jan 2013 Libya test proceeded - decision made to update installer for Windows 7 32/64 bit issue were noted with installation between diferent system maily due to Windows UAC.
  • Data (500GB) and application sent out via diplomatic pouch.
  • Mac OS X trail successful
  • Due to host system requirments - installer needed to be rebuilt, and is currently in progres.
  • Test team established and using issue tracker.

The diagram below illustrates the 3 phases of s/ware development.

1. Production - (developer tests within his environment)

2. Testing - (new issues are identified and subsequently fixed � quality is correlated to the number testers avail to test)

3. Deployment (hopefully most of the issues have been identified within the tester(s) environment(s) )

Issues are found by testers and fed back to the developers � the time this takes is a product of the red and green arrows in the bottom diagram. It is not advisable to go direct from production to deployment because end users become the testers (this complicates matters exponentially) . It is the speed of the red arrow that is controlling the first release of RAMSESv4. Note: The 16 week development time allowed us to get to stage 2 (Libya).

Test Stages

Diagram illustrating Agile testing and delivery

The Agile Progress

2012-08-24

Data unification and import complete for Sudan.

(Automated error correction - complete).

  • Duplicate tables have been merged.
  • Duplicate columns have been merged
  • Datatype errors have been corrected.
  • Data length errors have been corrected.
  • Case errors have been corrected.

(Manual error correction - to complete).

  • Remove language duplicates

Below is an example of language duplicates in the soils table. (Staff resources tbc) Language Duplicates

Current Progress, GREEN complete

Unification Progress

2012-08-22

Option to show Decimal:Degrees in Degrees:Minutes:Seconds

Below shows the requested feature ==> the option to display RAMSES map units in dd:mm:ss.

DD to DD:MM:SS

2012-08-22

PostGIS 2.0 Geography Data Type

Below are the results of the latest test with the Geography datatype in PostGIS.

The display below shows the complete data set for DLC. Note: The system has been indexed so that data retrival and display is very fast.

Geography Data Type

The Geography data type has been chosen over geometry due to the the global distribution of the data. All measurements will be in meters using the WGS84 datum.

2012-08-17

HQ server stack

Current Progress

(bottom) is a schematic diagram of the new Open Source server stack. This system will be used to provide seamless global access to operational data - image source http://opengeo.org

It is composed of the following technologies:

  • PostGIS 2.0 the spatial database engine that accommodates both RAMSES and SWARMS data.
  • GeoServer and GeoWebCache -applications that serve both imagery and vector data as layers.
  • GeoExt - the user front end.

Linux Server Stack

2012-08-15

PostGIS 2.0 testing complete

Current Progress

We have been testing the RAMSES modified core with the latest version of PostGIS. We can confirm that PostGIS v2.0 will ship with the beta release.

This version of the database system has been in development for many years, with this release (2.0) representing major technological advancement in the world of geospatial systems. So much so, that ESRI are making ArcGIS 10.1 compatible, without the need for additional licensing or middleware. Note: this is likley to apply to vector data only - read/write/edit.

New features of this release include:

  • Raster data and raster/vector analysis in the database
  • Topological models to handle objects with shared boundaries
  • PostgreSQL typmod integration, for an automagical geometry_columns table
  • 3D and 4D indexing
  • Index-based high performance nearest-neighbour searching
  • Many more vector functions including.
  • -ST_Split
  • -ST_Node
  • -ST_MakeValid
  • -ST_OffsetCurve
  • -ST_ConcaveHull
  • -ST_AsX3D
  • -ST_GeomFromGeoJSON
  • -ST_3DDistance
  • Integration with the PostgreSQL 9.1 extension system
  • Improved commandline shapefile loader/dumper
  • Multi-file import support in the shapefile GUI
  • Multi-table export support in the shapefile GUI
  • A geo-coder optimized for free US Census TIGER (2010) data

source: http://postgis.refractions.net/news/20120403/

2012-08-14

Update to the error correction process

Current Progress

We have decided to write the output of the tool to a spreadsheet in order to provide an audit trail. This will help with the manual checking for language duplicates.

Processing stages as follows:

1. Convert each country database to a postgresql database

2. Write raw output to the spreadsheet Tab 1.0 - all tables columns with no modifications.

3. Automatically correct syntax errors:

  • Case Errors: i.e. [Table One & table one] [Column One & column one] => all converted to lower case [table one] [column one]
  • White Space: [table one ==> table_one] [column one ==> column_one]

4. Merge full duplicate columns & append all unique columns - write output to spreadsheet Tab 2.0 highlighting columns with data type errors (half duplicates)

  • inconsistent datatype between columns of same name
  • inconsistent datatype size between columns of same name

Note: This has been done below - the pink column highlights how many datatype errors were found by the merge algorithmn: For example the column RecordNo of table ImportLog has both datatype and size inconsistencies. Example Output

5. Begin correction for datatype errors > write output to spreadsheet Tab 2.1 proposing the corrections to the columns with inconsistent datatype - confirm these manually with team.

6. Merge half duplicate columns > write output to spreadsheet Tab 2.2 marking columns that were subject to datatype correction.

7. Give user community output spreadheet Tab 2.2 so that manual checks for language duplicates can take place - duplicates marked in red.

8. Run first universal data import in order to identify any further import errors.

9. Correct any import errors along with manually identifed languages duplicates in order to arrive at final schema Tab 3.0 onwards

2012-08-07

Manual checking of language duplicates:

The merged data of all countries contains two database frameworks, English and French respectively. Although the merge algorithm can identify many problems, it cannot differentiate between a different name given to the same thing.

It is likely that some of the duplicate columns are actually language duplicates.

For example the table soils.

  • Soil humidity
  • Humidite du sol

In order to identify these a manual check for French column names needs to take place within all tables.

Below is a overview of the analysis by category type:

Non Duplicates

Diagram illustrating how the algorithm filters each column into a logical category for processing: Language Duplicates

Diagram illustrating language duplicates of same datatype: Language duplicate **same** data type

Diagram illustrating language duplicates of different datatype: Language duplicate **different** data type

Solution:

  • 1. manually check the list of all duplicates for French column names.
  • 2. to cross check the same English tables to find if a language duplicate exists.
  • 3. where a language duplicate is found - to compare the data type and cell values noting any differences.
  • 4. feed the identified issues into the SQL insert script => script standardises datatype, amends column name to English and merges column(s) => actions stored in SQL insert script for data import.
  • 5. add differences in cell values "green" "verte" to any new SQL statement i.e. select * from table_name where column = "green" OR "verte"; Point 4: assumes that values are subject to logical contraints on entry - this is not an immediate concern as long as the underlying framework is unified correctly and uncessary duplicates are removed.

2012-08-02

Development of an additional processing filter.

Problem: different names given to, and different languages used within, the same tables.

During the data import, it has become apparent that an additional level of processing is needed. Although the processing algorithm can identify tables/columns that are not common to all databases. It cannot distinguish between the same thing written in a different language or case, because computers are syntax and case sensitive.

For example: [Column_Green, Column_Red, Column_Green] => would result in Column_Red being flagged as different.

HOWEVER, [Column_Green, Column_Red, Column_Verte] => would result in All being flagged as different, although Column_Verte and Column_Green refer to the same column.

This is compounded by the fact that different case has been used to name the same table(s). The example below illustrates where lower case 'l' and upper case 'L' have been used to name the same 'LOCUST' table.

ALG Locust Table DLC Locust Table

This results in the same data being stored in different tables, and requires different SQL statements to access the same data.

Solution: we have to design a new filter to list all tables/columns that are different in syntax, and manually cross check for errors.

We need to do three manual checks.

  • 1. Check each unique TableName to ensure that it is unique and not a language duplicate.
  • 2. Check each unique ColumnName to ensure that it is unique, not a language duplicate.
  • 3. Check each unique ColumnName.DataType to ensure we don't have 'language duplicates' that are also 'half duplicates' (not common in data type).

Case errors will be built into the filter for correction - to make this easier we plan to group the data into logical sets for manual checking - this responsibilty will need to be shared amongst the wider team.

In addition we will modify the algorithm to facilitate and record adhoc structural changes to the new framework for data import. Using our example above, the 'locust' table would have been corrected to 'Locust' - therefore data from the old table, will need to be imported into the new table 'Locust'.

In effect we are trying to amalgamate 16 different systems and databases into a cohesive unified stucture, so that the same query will work on all databases. We want to develop ONE system so that costs are kept to a minimum. Whilst this process is possible, it is not without its complications - thefore we need a dynamic way to handle and store the changes we make to the new unified framework - see overview of the solution below.

The dynamic SQL insert script and stored changes//

2012-07-31

Final data import into the unified schema

Current Progress

  • Completed implementing the interpreter
  • Completed testing the interpreter
  • Completed the virtualization classes
  • In progress debugging the filtering system see below.

Current Problems

Problem 1

The current data-merging has a problem when changing a column's data type, this is because we cannot change the column data type when the table contains data. To get around this we need to create an empty clone of the column with the new data type and populate it with the original columns content. However when we try to merge the data from the country table with the unified schema table, we get an import error. We are working on a solution.

Problem 2

New category of column type duplicate by language

EACH column will fit into ONE of THREE categories.

  • 1. Full Duplicate - common to all
  • 2. Half Duplicate - common to all but different data type
  • 3. Non Duplicate - not common to all

However we have noticed that we have a situation where we have the same column represented by two different languages.

for example

  • Country_A.[table_1]
  • column_one
  • column_two
  • column_three
  • Country_B.[table_1]
  • column_one
  • column_deux
  • column_three

in this case the two columns column_two and column_deux would be represented as 'non duplicate columns' so should be appended to the final schema - the resulting merge should look like this

  • UnifiedSchema.[table_1]
  • column_one
  • column_two
  • column_deux
  • column_three

However, this causes a problem as these are actually "duplicates by language" i.e. the "same column" represented in different languages - therefore for the database to be fully unified we need to do the following.

  • 1. Produce the list of non duplicates
  • 2. Manually check to see what columns are "duplicate by language"
  • 3. Check that they have the same data type
  • 4. If yes to (3) merge them into the English column name

UnifiedSchema.[table_1].column_two + UnifiedSchema.[table_1].column_deux ==> UnifiedSchema.[table_1].column_two

This problem could affect anything up to 275 columns - these will need to be checked by a human being to see what columns are in fact the same.

2012-07-17

Virtualisation of the data and data structure to arrive at global unification

Stages.

  • 1. Import individual MS Access databases into individual Postgres databases.
  • 2. Merge the Postgres databases from step one into a global merge to include errors and duplicates
  • 3. Import each individual Postgres database (data) and the global merge (structure) into the virtualisation classes and into memory.
  • 4. Read changes required by the user group into the interpreter class.
  • 5. Modify the virtual data structure and import data from memory.
  • 6 Output the merge into a new database for each country.

The illustration below gives an overview of process involved to complete the data merge and import process

CurrentBridgedSystem

2012-07-11

Server Migration Process

The illustration below gives an overview of the temporary server structure whilst migration is in process - items in RED are yet to be completed

CurrentBridgedSystem

2012-07-10

The new platform will enable developers to quickly and easily build customised tools

See examples below

USER DEFINED SPATIAL QUERIES

MEAN NDVI over time for user selected region

NDVI over time for user selected region

Mean daily rainfall over time for user selected region

Rainfall over time for user selected region

Flightpath analysis

Flightpath analysis

MULTIPLE DATASET SPATIAL QUERIES

Regional summary of multiple data

Regional summary of multi data

Radial summary of multiple data

Radial summary of multi data

Bounding box summary of multiple data

Bounding box summary of multi data

COST/PREDICTIVE SURFACE MODELLING

Density Risk Modelling based on rainfall/temperature/vegetation/wind

Density Risk Modelling based on rainfall/temperature/vegetation/wind

Risk model output

Model output

FEATURE SELECTION AND BACKGROUND MAPPING

Feature selection

Feature selection

Background imagery

Background imagery

2012-07-05

System Unification

The diagram below shows the system data flow - note: the filters check both attribute and spatial data.

Data Flow

Current Progress

Below is the system development road map - this shows how the system fits together as a single entity and the direction it is taking. Not only will the system be open, but duplication of effort will be significantly reduced. The modular architecture will enable developers to work individually on the project, without compromising the integrity of the system.

2012-07-04

Roadmap

Current Progress

Below is the system development road map - this shows how the system fits together as a single entity and the direction it is taking. Not only will the system be open, but duplication of effort will be significantly reduced. The modular architecture will enable developers to work individually on the project, without compromising the integrity of the system.

Road Map

2012-07-03

Database unification update - categories of operation required.

Current Progress

The types of conversion process to date are as follows (note: this does not include Yemen back end)

  • character varying
  • character
  • double precision
  • integer
  • text
  • real
  • smallint
  • boolean

The data types above need to be converted so that a single data type exists for each column_name. The example output below illustrates this issue, Algeria has a different data type to the other countries for Wind Speed.

  • ===== WindSpeedStart =====
  • ALG_BE.Control.WindSpeedStart = smallint
  • DLC_BE.Control.WindSpeedStart = double precision
  • EGY_BE.Control.WindSpeedStart = double precision
  • ETH_BE.Control.WindSpeedStart = double precision
  • IND_BE.Control.WindSpeedStart = double precision
  • IRN_BE.Control.WindSpeedStart = double precision
  • LIB_BE.Control.WindSpeedStart = double precision
  • LocustBE.Control.WindSpeedStart = double precision
  • OMN_BE.Control.WindSpeedStart = double precision
  • PAK_BE.Control.WindSpeedStart = double precision
  • SAU_BE.Control.WindSpeedStart = double precision

To correct this problem we will develop THREE types of operation.

  • 1. Conversion.Op
  • TableName => NewTableName
  • ColumnName => NewColumnName
  • DataType => NewDataType

(the data above will be dynamic in that it will feed in from an external input file) - example syntax as follows, note: identifiers in bold

  • converted: table:tableName; column:col3; datatype:oldDataType=>newDataType
  • converted: table:tableName=>newTableName .
  • 2. Delete.Op

(Remove unwanted tables/columns)

  • 3. Merge.Op

(Remove redundant full duplicate columns)

The changes made to the first draft of the unified schema will feed into the data import mechanism because the two processes are related.

See below screenshot of the latest analyses to include data from India and Pakistan (excluding Yemen front end).

  • The universal set (all data) is 97.2% complete.
  • The databases are 94.9% similar.
  • The data type irregularities affect 3.9% of all columns.
  • 1.2% of all columns are not universal to each table.
  • Total column count is 22681.

Data Output

2012-07-02

Database unification process for the wider group

Current Progress

Problem and methodology:

This problem can best be described as a giant dinner party for 20,000 guests where some (an unknown number) have been given the same seat by mistake. We have to account for all guests as they arrive - only once we have done so, can we move all guests to the dining room. However before we do this we need to explain a simple rule - if your seat is already taken, please stand behind the person sitting at the table. Once the dining room is full and all guest are seated,we can add the number seated with the number standing, and check against the total number that arrived. This way we know how many additional seats we need, and can begin to solve the problem.

We have a similar problem, in that this relates to the many columns of data we need to merge. We must ensure that we can agree the total number of columns we expect and categorise them BEFORE we begin to make changes and move them around.

Broadly speaking the data can be split in four distinct groups.

  • Full duplicates - this is where table name, column name and data type are the same - in all databases.
  • Half duplicates - this is where table name, column name are the same, but data type is different - in all databases.
  • Non duplicates - this is where the column name does not appear in all databases
  • Misc - this is for random tables that do not fit into any of the above three categories - they can be checked manually and discarded

See a more logical representation below.

Unification Process BEFORE Unification Process AFTER

2012-06-28

Development of the metadata output in order to merge databases into final schema

Current Progress

The merge and import tool is approximately 80% complete and development is continuing to ensure that the tool is self-testing. This is important because we need to make sure that there are no technical problems, before we begin to import each country into the new unified schema.

The output of the tool currently lists the existing structure by a term called half duplicates. This is where Database Name + Table Name + Column Name are the same, BUT Data Type is different. In order to see where conversion between columns needs to take place - we list the analysis by grouping each Table Name in the format DatabaseName_TableName_ColumnName = Data Type. Work is currently focussed on categorising and listing the conversion types required to ensure all columns have the same Data Type. With these requirements established, the group can then agree on the conversion logic. One important consideration during this process is to avoid precision loss - for example: we would prefer to go from int (1) to double number (1.00), as opposed to double (1.00) to int (1).

Once the conversion logic has been established, we will be able to integrate this into the tool, and begin testing of the process start to finish.

To ensure we include the data structure from all countries, we currently need to analyse the databases from India, Pakistan and Yemen.

2012-05-24

Analysis of the database files by the 'merge import tool' - tool overview below

Algorithm Process Algorithm Process Algorithm Process

Current Progress

Individual databases are imported using the Add access dbs button.

Individual MS Access databases are imported into Postgres as individual country databases (ICDs) where they are checked and merged into a single unified schema.

<Options>

  • Merge - creates the new physical database schema from all ICDs
  • Analize - reads tables and columns of all ICD databases and performs an intersection (tables/columns common to all)
  • Self Check - checks all tables columns from each MS Access database and it's respective ICD clone.

<Functions>

  • Merge data Button - adds data from each ICD into the newly created unified schema for that ICD.
  • Start migration Button - migrates all Access databases into ICDs.
  • Add dbs Button - adds input MS Access databases.
  • Refresh Button - refreshes the ICD list (right).

Results to date:

India, Pakistan and Yemen seem to have corrupt databases so will need to be requested again.

All databases (MS Access/ICD) have been analysed. The tool has managed to provide the following statistics:

  • Total # databases = 32
  • Total # columns = 20247
  • Total # duplicates matched by table_name, column_name & data_type = 19247 (95.06%)
  • Total # half duplicates matched only by table_name & column_name, but NOT data_type = 725 (3.58%)
  • Total # non duplicate columns = 275 (1.35%)

Summary: the universal set of databases are 95.06% the same. We will have to correct 3.58% of the data type manually, and append a further 1.35% to the existing structure as non duplicate columns. Note: these results exclude parts of India, Pakistan and Yemen.

The half duplicates are structural errors that resulted in the early design of the system. Individual countries were provided their own databases - the system became more homogeneous as time passsed. Thanks to the tool's example output below - we will be able to correct these issues as a team.

  • IRN_BE.Code 49.rec = smallint
  • ALG.Code 49.rec = integer
  • ETH_BE.Code 50.rec = smallint
  • LocustBE.Code 49.rec = smallint
  • IRN_BE.Code 50.rec = smallint
  • DLC_BE.Code 49.rec = smallint
  • ALG.Code 50.rec = integer
  • DLC_BE.Code 50.rec = smallint
  • LocustBE.Code 50.rec = smallint

Updates are being made to display the analysis results within the UI.

2012-05-16

Development of the algorithms needed to analyse all databases in order to merge into final schema - algorithm overview below

Algorithm Process

Current Progress

Development is underway to unify all databases into a single cohesive data structure - specialist algorithms have been developed to anaylse the databases and produce solid metadata to assist with the process. The result will be a data schema S1 that the group can finalise S1.1 before we commence the design of the main data import algorithms.

2012-05-10

Development of the new merge and import tools, below screenshot of the database merge tool

New database dump tool New database table creation tool

Current Progress

MS Access is proprietary technology (Microsoft) and has not been designed in an open manner. Therefore extracting data and merging the many tables is not without problems, especially when considering we have to deal with 000's of random column names. Most of which make no sense code 50 & code 601 for example...

For this reason (following on from and re-checking the spread sheet) we are currently developing two tools, a merge tool and an import tool - these will extract the schemas from each MS Access database and merge them into a single unified PostgreSQL database.

The solution is as follows.

1. Read all databases using a specifically designed SQL statement. 2. Design a custom merge tool that extracts the create statements for the tables of all the databases. 3. Using the extracted information ---> create the tables using the following logic.

If a tables exists { proceed to read it`s structure and compare it with the create statement. }

This tool will enable us to compare tables column by column, when a new column is found it is added to the table. For each step within the algorithm, a log file will automatically track relevant information such as, duplicates, unexpected additional columns etc.

The output of this process will be the creation of the new SQL code that will be used to create the unified data schema - this can be modified as required to suit the new system(s) i.e. code 50 could become LUTSurveyTechnique & code 601 could become LUTWindDirection - (LUT = Look Up Table).

2012-05-05

Development of unified data model for all countries

Current Progress

Work is underway to amalgamate the individual databases from each country - the proposal is to import each MS Access database into the new PostgreSQL system and amalgamate them into single postgres database. It is hoped that this will show duplicate tables and fields. There are currently 00's of tables and approximately 1881 fields to reconcile - this part of the project is essential to arriving at a unified database system for all countries and HQ.

2012-05-03

PostGIS v Google Earth Accuracy Tests

PostGIS v Google Earth Accuracy Test

Current Progress

Sub 25m accuracy within the new spatial database system(s) is expected - initial testing shows PostGIS to be highly accurate as it takes the curvature of the earth, WGS84 Ellipsoid, into consideration when calculating distance between objects.

2012-05-01

Completion of new layer tree panel view to show new symbol set

Current Progress

This feature has now been completed. The user will be able to view the symbol set underneath the layer on the left hand side of the map. We have added the functionality as a checkbox to the customise menu - this way the user can turn the feature on/off as required.

2012-04-25

A new issue tracking sytem is in place

Current Progress

You can now raise issues directly with the developer community by clicking the link Go to the issues>> at the top of this Wiki. Issues can be raised and sent to the development team in real time, they will be able to respond to you directly, leaving a copy of their response against the relevant issue for others to see. You can raise issues for the following categories, as well as track issues raised by others:

  • bug
  • enhancement
  • proposal
  • task

each issue can be given one of 5 priorities

  • trivial
  • minor
  • major
  • critical
  • blocker

It is hoped that this system will encourage a greater feeling of involvement and teamwork amongst the user community and development team.

2012-04-22

Development of new layer tree panel view to show new symbol set

New RAMSES symbol set

Current Progress

We are currently developing a legend to show the new symbol set. These will be shown alongside the layer names in the left hand tree panel view.

2012-04-19

Development of PostGIS database model

new spatial database system

Current Progress

We are currently involved with developing the new database system.This system is one of the most advanced spatial database systems in the world - it is capable of supporting over 3000 different Geographic and Cartesian projection systems. This system will provide the underlying framework for the global forecast operation.

2012-04-12

alpha release of RAMSES to Cairo workshop

New RAMSES application

Current Progress

We have managed to work on several new functions for the up and coming demo to the Cairo workshop. The main changes we've made, have been adding the new database sytem and query mechanism.

Other enhancements have included the following features:

  • New icons have been added to the main user interface
  • A new spatial database system has been implemented to replace MS Access.
  • The project is entirely built on open source technologies
  • We will be able to extend and deliver new functionality easily using the plug-in structure.
  • The new system has been fully internationalized and will support English, French and Arabic out of the box.
  • The system will automatically save user queries when the project is saved.
  • The splash screen has been completely redesigned.
  • Simplifying the date selection interface.

.

Updated