phoenixdb / phoenixdb / issues / #40 - Prepare some use cases for the first version — Bitbucket

Issue #40 new

Roman Simakov created an issue 2015-10-28

We need to have several customer use cases which could satisfy China customers and build some cluster and database which can handle them before to release the first version.

Comments (9)

Roman Simakov Account Deactivated reporter
- changed version to 0.1.0
- 2015-11-22T16:34:56+00:00
yang theseus
these days I will collect some requirements from customers, mainly for GIS application. talking with customers about how to integrate PhoenixDB with GIS application. after that I will send our team the detailed documents.
- 2015-11-23T06:05:24+00:00
Roman Simakov Account Deactivated reporter
May I suggest to setup the same cluster as you already did with SciDB? I guess it's important to start with something already tested. These requirements can be collected in form of queries to load data and to analyze customers data. It will be close to customer and understand to development!
- 2015-11-23T06:25:07+00:00
yang theseus
Now Our 1st use case has following business requirements, is that the unstructured data scraped from internet need importing into PhoenixDB and structured data extracted from MIS system in Oracle also need importing into PhoenixDB together. Then Front End Application go to query all the data in PhonenixDB, due to data qualities is not good , the data scraped by Spider have some properties , company name , contact number, province , city, county, detailed address etc. meantime, the data extracted by Oracle have some properties also including company name, detailed address, but many properties among internet and intranet are similar, not not same, now free hard to match exactly, if use RegExp function in PhoenixDB, we need write a lot of RegExp to match 'keyword', its effect is poor, can't fuzzy matching. I think we need integrate a Search Engine with PhoenixDB , or implement a Search() function by columns in PhoenixDB. I am thinking Sphinx(python) for PhoenixDB-py or Elastic Search(Java) for JDBC. @Roman Do you have better proposals?
- 2016-01-16T03:32:21+00:00
yang theseus
- 2016-01-16T05:33:01+00:00
yang theseus
- 2016-01-16T05:52:18+00:00
yang theseus
for example, in 1st table, "淮北矿业股份有限公司袁店一井煤矿" is company name, "安徽省淮北市濉溪县" is detailed address; in 2nd text, "淮北矿业(集团)有限责任公司" is company name, "安徽" is province address. Actually, the two company name are same, so '淮北矿业' and '安徽' is keyword for searching. PhoenixDB need match the entries according keyword '淮北矿业' and '安徽' on different columns in different tables. then return above query results.
- 2016-01-16T06:03:30+00:00
Roman Simakov Account Deactivated reporter
I have no better proposal. I guess a set of operators, like: ES_INDEX, ES_SEARCH, etc, which will integrate us with ElasticSearch and solve such tasks.
- 2016-01-16T09:07:16+00:00
Roman Simakov Account Deactivated reporter
I suggest to split work: 1) You prepare metadata of PhoenixDB database, i.e. how it will look; prepare some data to load in to it and probably try to develop queries to PhDB using "imagine" operators like ES_SEARCH. 2) We make this operator working, i.e. implementing ES_SEARCH and make other changes.

Is it possible?

P.S. If possible try to make example data on English language. It's really difficult for us to analyze china symbols, match them, check correctness of result, etc... After English we and you will check it on China language as well.
- 2016-01-16T09:54:32+00:00
Log in to comment

Assignee: yang theseus

Type: task

Priority: critical

Status: new

Component: unassigned component

Milestone: unassigned milestone

Version: 0.1.0

Votes: 0

Watchers: 2

Jira: the preferred issue tracker for Bitbucket. Join the team!