Use Pandas Series, DataFrame and Panel with Monary

Issue #19 new
Femto Trader
created an issue

Hello,

Support for Pandas Series, DataFrame, Panel will be great.

This article could help http://alexgaudio.com/2012/07/07/monarymongopandas.html

Kind regards

Comments (3)

  1. Femto Trader reporter

    For DataFrame we could use pd.DataFrame.to_dict code https://github.com/pydata/pandas/blob/master/pandas/core/frame.py#L814 with orient='split' but with some modifications to output dict of Numpy Array (instead of dict of list)

    Something like

    from pandas.core.common import _maybe_box_datetimelike,
    import pandas.lib as lib
    
    d = {'index': df.index.values,
        'columns': df.columns.values,
        'data': lib.map_infer(df.values.ravel(), _maybe_box_datetimelike)
        .reshape(df.values.shape).values}
    
  2. Femto Trader reporter

    An other approach with Pandas DataFrame

    Sample data from https://drive.google.com/file/d/0B8iUtWjZOTqla3ZZTC1FS0pkZXc/view?usp=sharing

    In [1]: import pandas as pd
    In [2]: df = pd.read_csv("AUDUSD-2014-01.csv", names=['Symbol', 'Date', 'Bid', 'Ask'])
    In [3]: df
    Out[3]:
              Symbol                   Date      Bid      Ask
    0        AUD/USD  20140101 21:55:34.404  0.88796  0.88922
    1        AUD/USD  20140101 21:55:34.444  0.88805  0.88914
    2        AUD/USD  20140101 21:55:34.475  0.88809  0.88910
    3        AUD/USD  20140101 21:55:48.962  0.88811  0.88908
    4        AUD/USD  20140101 21:56:38.293  0.88808  0.88887
    ...          ...                    ...      ...      ...
    1947101  AUD/USD  20140131 21:59:48.048  0.87525  0.87589
    1947102  AUD/USD  20140131 21:59:54.599  0.87527  0.87589
    1947103  AUD/USD  20140131 21:59:56.927  0.87531  0.87588
    1947104  AUD/USD  20140131 21:59:59.365  0.87531  0.87574
    1947105  AUD/USD  20140131 22:00:00.038  0.87531  0.87574
    
    [1947106 rows x 4 columns]
    
    In [4]: df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d %H:%M:%S.%f')
    In [5]: df = df[['Bid', 'Ask']]  # I'm just storing float for now!
    In [6]: lst_cols = list(map(lambda col:  ma.masked_array(df[col].values,df[col].isnull()), df.columns))
    In [7]: mparams = monary.MonaryParam.from_lists(lst_cols, list(df.columns), ['float64', 'float64'])
    In [8]: %time m.insert('monary_db', 'ticks', mparams)
    CPU times: user 1.88 s, sys: 466 ms, total: 2.35 s
    Wall time: 59.7 s
    

    So we are storing nearly 65000 floats per seconds

    In [9]: 1947106*2 / 60
    Out[9]: 64903.53333333333
    

    Retrieve data can be done using

    arrays =m.query(db_name,collection_name, {}, columns, ['float64', 'float64'])
    

    arrays is a list of masked array

    [masked_array(data = [0.8879600000000001 0.88805 0.8880899999999999 ..., 0.87531 0.87531 0.87531],
                  mask = [False False False ..., False False False],
            fill_value = 1e+20),
     masked_array(data = [0.8892200000000001 0.8891399999999999 0.8891 ..., 0.8758799999999999
      0.87574 0.87574],
                  mask = [False False False ..., False False False],
            fill_value = 1e+20)]
    

    How to make a DataFrame from this ?

    df_retrieved = pd.DataFrame.from_records(arrays).T
    df_retrieved.columns = columns
    
  3. Log in to comment