Prepare some use cases for the first version

Issue #40 new
Roman Simakov created an issue

We need to have several customer use cases which could satisfy China customers and build some cluster and database which can handle them before to release the first version.

Comments (9)

  1. yang theseus

    these days I will collect some requirements from customers, mainly for GIS application. talking with customers about how to integrate PhoenixDB with GIS application. after that I will send our team the detailed documents.

  2. Roman Simakov Account Deactivated reporter

    May I suggest to setup the same cluster as you already did with SciDB? I guess it's important to start with something already tested. These requirements can be collected in form of queries to load data and to analyze customers data. It will be close to customer and understand to development!

  3. yang theseus

    Now Our 1st use case has following business requirements, is that the unstructured data scraped from internet need importing into PhoenixDB and structured data extracted from MIS system in Oracle also need importing into PhoenixDB together. Then Front End Application go to query all the data in PhonenixDB, due to data qualities is not good , the data scraped by Spider have some properties , company name , contact number, province , city, county, detailed address etc. meantime, the data extracted by Oracle have some properties also including company name, detailed address, but many properties among internet and intranet are similar, not not same, now free hard to match exactly, if use RegExp function in PhoenixDB, we need write a lot of RegExp to match 'keyword', its effect is poor, can't fuzzy matching. I think we need integrate a Search Engine with PhoenixDB , or implement a Search() function by columns in PhoenixDB. I am thinking Sphinx(python) for PhoenixDB-py or Elastic Search(Java) for JDBC. @Roman Do you have better proposals?

  4. yang theseus

    for example, in 1st table, "淮北矿业股份有限公司袁店一井煤矿" is company name, "安徽省淮北市濉溪县" is detailed address; in 2nd text, "淮北矿业(集团)有限责任公司" is company name, "安徽" is province address. Actually, the two company name are same, so '淮北矿业' and '安徽' is keyword for searching. PhoenixDB need match the entries according keyword '淮北矿业' and '安徽' on different columns in different tables. then return above query results.

  5. Roman Simakov Account Deactivated reporter

    I have no better proposal. I guess a set of operators, like: ES_INDEX, ES_SEARCH, etc, which will integrate us with ElasticSearch and solve such tasks.

  6. Roman Simakov Account Deactivated reporter

    I suggest to split work: 1) You prepare metadata of PhoenixDB database, i.e. how it will look; prepare some data to load in to it and probably try to develop queries to PhDB using "imagine" operators like ES_SEARCH. 2) We make this operator working, i.e. implementing ES_SEARCH and make other changes.

    Is it possible?

    P.S. If possible try to make example data on English language. It's really difficult for us to analyze china symbols, match them, check correctness of result, etc... After English we and you will check it on China language as well.

  7. Log in to comment