Overview

HTTPS SSH

Warning!

lifelog.rb v0.2 represents almost a complete rewrite from v0.1. In particular, both the input and output formats have changed significantly, as have the configuration options. Make sure you back-up your Day One journal and re-read the documentation below before upgrading!

While these changes are jarring, they will also allow lifelog.rb to be extended and improved in a less invasive fashion in the future.


lifelog.rb

This is a simple life logging script designed to work with a variety of input sources and provide a variety of possible outputs. Currently only a single input and output module are implemented, but additional modules are planned. Contributions welcome!

Requirements

lifelog.rb was developed under Ruby 2.1.3; it may very well run with early versions, but I haven't tested it. The core script only requires the date and yaml gems, which should come standard as part of your Ruby installation.

Modules may require additional gems to function, however; see the discussion below detailing the currently included modules for additional details.

Configuration

lifelog.rb is configured via a simple YAML file at ~/.lifelog.yml; an almost-ready-to-use version of this file can be found in this repository with the name lifelog.yml. It contains the following configuration options:

  • module_path: A string defining the path input and output modules should be found. This path will be appended to Ruby's $LOAD_PATH.

  • lock_file: A string defining where lifelog.rb's lock file should be written to. This file will contain the timestamp of the most recent run, which may be used by input and output modules. It is suggested, but not required, that this file be stored in a location that provides persistence ($HOME rather than /tmp, for example).

  • input: An array of arrays specifying input modules that should be used. Each key should be a module name (case-sensitive); this name, lower-cased, should also be used as the file name of the input module itself. The value of each key should be another key-value array specifying the options for this module. You must include at least one input module.

  • output: An array of arrays specifying output modules that should be used. Each key should be a module name (case-sensitive); this name, lower-cased, should also be used as the file name of the output module itself. The value of each key should be another key-value array specifying the options for this module. You must include at least one output modules.

The provided example lifelog.yml file should give you a good idea of how these different sections fit together in practice; details concerning the modules included in this repository are below.

Modules

The lifelog.rb script itself has the following workflow:

  1. Load configuration data from ~/.lifelog.yml.

  2. Determine the timestamp for the current run.

  3. Load each input module and call the associated read method. read should return an array of hashes, which will be appended to a master journal array.

  4. Eliminate any duplicate entries from journal.

  5. Load each output module and pass journal to the associated write method. The journal array will cannot be modified by any write method!

  6. Write out the current run's timestamp to the lifelog_lock file.

That's it. As you can see, lifelog.rb itself doesn't know anything about reading is log data or writing that data out. All it does is aggregate a set of inputs (these might come from files, RSS feeds, API calls, or any combination of sources), and then pass all of this data along to a (possibly) separate set of modules charged with generating outputs (outputs might write out data files for another application, blog entries, database records, whatever).

The secret sauce thus lives entirely in the associated input and output modules.

Included Modules

The following modules are already part of the lifelog.rb repository; they provide basic functionality and serve as a examples for individuals interested in implementing their own modules. Adding new modules is as simple as dropping an appropriately-named Ruby file containing the new module in module_path and adding a matching configuration block to ~/.lifelog.yml.

Check below for a more in-depth discussion of the standard key-value pairs and conventions mentioned here.

Inputs

The following input modules are currently included by default with lifelog.rb.

YamlFiles

The YamlFiles input module reads entries in from a directory of files, each of which is assumed to be more-or-less properly formatted YAML and to hold a single log entry. Entries for the current day are ignored (on the assumption that it is currently incomplete), as are non-metadata, non-image entries prior to the last_run timestamp held in the lifelog.rb lock_file (if no such timestamp exists, then all entries prior to today are read). All non-metadata entries are then archived in a directory of the user's choice; metadata entries are held for a longer period of time so as to provide context for late-arriving image entries.

YamlFiles accepts the following configuration options:

  • input_dir: The directory from which entry files should be read. If this value is not set, then YamlFiles will abort and return an empty array!

  • archive_dir: The directory that processed entries should be moved to. If this value is not set, then YamlFiles will abort and return an empty array!

  • file_extension: The entry file extension; files in input_dir with an extension different from this will not be read or archived. If not set, defaults to yml.

  • date_format: The Ruby DateTime strftime spec to use for parsing entry date keys. Defaults to %Y-%m-%d %H:%M:%S %Z if not set.

  • archive_age: The number of days that old metadata entries should be preserved. Defaults to 30 if not set.

  • geonames_username: The GeoNames username to use for latitude/longitude-to-timezone lookups. Defaults to nil, which will disable this lookup. Note that it is not sufficient to just have a GeoNames account -- you will also need to enable web API services at http://www.geonames.org/manageaccount.

  • tz_countries: A list of ISO 3166-1 alpha-2 country codes to search for timezone information; defaults to the empty list ([]). This is mostly a convenience variable -- with out lifelog.rb should still determine the correct timezone of the entry, but may use a less sensible IANA timezone name than might otherwise be desired.

YamlFiles requires the following Ruby gems in order to function: date, fileutils, geocoder, rmagick, securerandom, timezone, tmpdir, tzinfo, and yaml. Internally, the read method of this module uses the following workflow:

  1. Read in all entries from input_dir and set the date, datetime, and timestamp keys using either the date key in the file or the file modification time if no date key is available.

  2. Set any entries that occur on the current day to be ignored; also ignore non-metadata, non-image entries that occur before lifelog.rb's previous run.

  3. Archive all non-metadata entries that occur before today, and all metadata entries older than archive_age.

  4. Attempt to retrieve latitude and longitude information from any Google Maps URLs present in an image, link, or message key.

  5. If an image key exists, save it locally as a JPG and write the corresponding location out in the local:image key.

  6. If the image contains valid Exif data, override as much entry information as possible using that information.

  7. If not already set, attempt to determine the entry timezone using GeoNames and any available latitude and longitude information. If this fails, then assume that we're using the system timezone.

  8. Finally, generate activity, location, music, steps, and weather metadata entries from non-metadata entries, if such entries posses suitable keys. (This step is done to provide us with the richest possible universe of metadata.)

The resulting log entries (both those that literally correspond to files, and additional metadata entries that have been synthesized from non-metadata entries) are then returned to lifelog.rb.

Outputs

The following output modules are currently included by default with lifelog.rb.

DayOne

The DayOne output module writes out Day One journal entries. Entries are condensed by type, except that those with locally-available images (local:image keys) are broken out such that every image has its own entry. As much metadata as is sensible is set for each entry, and any embedded hash tags (#HashTag, etc.) are automatically included as journal entry tags. Each entry covers at most one day, and the timestamp is set to that of the last log entry listed in the corresponding Day One journal entry.

DayOne accepts the following configuration options:

  • journal_dir: The full path to Day One's journal directory; note that since the Day One journal is a bundle, it will appear as a file in the Finder. DayOne will abort if journal_dir is not set!

  • starred: Set to true or false to indicate whether entries should be starred or not. Defaults to false if unset.

  • tags: A list of tags that should automatically be added to any Day One journal entries. Defaults to the empty list ([]) if unset.

DayOne requires the date, erb, fileutils, securerandom, and socket gems to function, all of which should be included as part of a standard Ruby installation. The DayOne's write method uses the following workflow:

  1. Separate out activity, location, music, steps, and weather metadata entries, and normalize all usable keys. Discard all other metadata entries.

  2. Separate out all non-metadata image entries, and normalize all usable keys.

  3. Separate out all non-metadata, non-image entries, and normalize all usable keys.

  4. Collect all non-metadata, non-image entries by type, sort by timestamp, and generate aggregate entries.

  5. Add metadata to all image entries and aggregate entries that don't already have it.

  6. Attempt to calculate Day One Region dictionaries (used by maps) based upon available GPS data.

  7. Write out all image and aggregate entries to individual Day One journal files.

Writing Modules

Modules for lifelog.rb are just ordinary Ruby modules with either a read() or write() method (or both) that are placed in the module_path directory. Internally, a module has the following form:

module MyLifelogModule
    require "required_gem_1"
    require "required_gem_2"

    def self.read(config, software_id, last_run, current_run)
        # Called if the module is listed as an input key.
    end

    def self.write(journal, config, software_id, last_run, current_run)
        # Called if the module is listed as an output key.
    end

    def.my_helper_function_1(some_input_1, some_input_2)
        # Internal helper function *not* called by `lifelog.rb`.
    end

    def.my_helper_function_2(some_input_1, some_input_2)
        # Internal helper function *not* called by `lifelog.rb`.
    end
end

This code would then live as a file called mylifelogmodule.rb in module_path. A module must include either a self.read or self.write method to be functional, and may even include both.

The read Method

Read methods should accept the following inputs:

  • config: Normally a hash defining any necessary configuration options, extracted from ~/.lifelog.yml, as key-value pairs; however, under certain circumstances arrays or individual values may be passed instead.

  • software_id: A string representing the current name/version of lifelog.rb.

  • last_run: A Ruby DateTime object encoding the last time lifelog.rb was run (i.e., the timestamp encoded in the lock_file). If no previous run value can be calculated, then nil will be passed instead.

  • current_run: A Ruby DateTime object encoding the time the current run of lifelog.rb began.

The read method should return an array of hashes, each hash representing a single entry, with keys loosely following the conventions described later in this document.

Any input file archiving or deletion should also be handled by this method.

The write Method

The write method should accept the following inputs:

  • journal: An array of hashes, each hash representing a single entry, with keys loosely following the conventions described later in this document. This array will be the sum total of all data generated by the input module read methods.

  • config: Normally a hash defining any necessary configuration options, extracted from ~/.lifelog.yml, as key-value pairs; however, under certain circumstances arrays or individual values may be passed instead.

  • software_id: A string representing the current name/version of lifelog.rb.

  • last_run: A Ruby DateTime object encoding the last time lifelog.rb was run (i.e., the timestamp encoded in the lock_file). If no previous run value can be calculated, then nil will be passed instead.

  • current_run: A Ruby DateTime object encoding the time the current run of lifelog.rb began.

write methods will often produce file output, but do not directly interact with file input. Because a variety of input sources may be used, it is the write method's responsibility to divide up the journal as appropriate, discard any entries that it cannot use, and normalize the remaining entries in whichever way is required.

Return values ofwrite methods are ignored by lifelog.rb.

Configuration Data

Suppose we have a ~/lifelog.yml file that looks like the following:

module_path: /home/myuser/.lifelog/modules

lock_file: /home/myuser/.lifelog/lock

input:
  FirstModule:
    option1: "value1"
    option2: 2.0
  SecondModule:
    - 1
    - "foo"
    - "bar"

output:
  SecondModule:
    option1: false
    option2:
      - array1
      - array2
  ThirdModule: ""

This file will result in FirstModule.read and SecondModule.read being used for input, and SecondModule.write and ThirdModule.write being used for output.

FirstModule.read will be passed the following configuration data:

config = {
    "option1" => "value1",
    "option2" => 2.0
}

SecondModule.read will instead get passed an array for its configuration:

config = [ 1, "foo", "bar" ]

SecondModule.write will get a slightly more complicated hash:

config = {
    "option1" => false,
    "option2" => [ "array1", "array2" ]
}

Finally, ThirdModule.write will be passed an empty string for its configuration (this is probably what you want to do if your module requires no configuration):

config = ""

Additional Conventions

The only formal restriction placed on the output of module read methods is that they should return an array of hashes, and that each hash should represent a distinct log entry, different components of which are designated by different key-value pairs. write methods are free to do whatever they like with this data; a particular key-value pair that is important for one output module may be completely ignored by another. There are a few key-value pairs established in the included modules that developers should probably follow, however. For generic entries, these are:

  • date: The full date and time as a string, preferably with timezone information. This should be used to initially set the entry date and time, but should ideally not be parsed further; output modules should use datetime or timestamp instead.

  • datetime: A Ruby DateTime object that encodes the actual date and time of a particular entry. This should obviously be calculated by the input module. Input modules must provide datetime keys for all entries!

  • timestamp: The UNIX timestamp of the entry, down to the second. This might be calculated by the input module, or alternately might be provided as part of whatever data the input module is parsing. In the latter case, a timezone specifier should also be provided. Input modules must provide timestamp keys for all entries!

  • timezone: A string specifying the IANA timezone of the entry.

  • allday: A boolean (true/false) indicating whether the given entry is "all day" (does not reference a particular time) or not. It is up to output modules to decide how to handle all-day entries, though input modules may provide hints by setting date, datetime, or timestamp.

  • image: The URL of an image associated with the entry.

  • local:image: Input modules may download files specified by the image key, or reference an existing image on disk, in which case local:image should be used to specify the path to that image.

  • licensing: A string providing licensing information for the given entry. If linking, etc. is required, use Markdown.

  • link: A reference URL for the entry.

  • message: The entry text itself. If formatting is required, use Markdown. All entries must either have a message key or have metadata set to true!

  • service: A string specifying a service (Flickr, Twitter, etc.) which is associated with the entry. May also be abused to provide "asides" to the main message.

  • type: A rough categorization for the entry (for example, "reading", "photography", "travels", etc.). type must be specified when metadata is set to true!

  • local:source: If the entry is generated using a local file, the input module should set local:source to the path of this file.

  • remote:source: If the entry is generated using a remote file or API, the input module should set remote:source to the appropriate URL.

  • metadata: Set to true if the entry represents metadata (weather, location, etc.) rather than a literal entry ("I did something"). Entries with metadata set to true must have a type set; entries without a metadata attribute, or for which metadata is set to false, must have a message set!

  • activity_window or location_window or music_window or steps_window or weather_window or window: The number of seconds before and after for which a given metadata entry should be considered valid. Note that this may not always be set by input modules, so output modules should implement sensible defaults. (window is the default key; the others are used if multiple metadata types are present in a single entry.)

Currently I've thought through some special keys associated with metadata entries as well (of course, all of these keys are also valid as part of generic entries). For metadata with a type of "activity" we have a single key:

  • activity: A string specifying the current activity. This riffs off of the values permitted by Day One, which are "Stationary", "Walking", "Running", "Biking", "Eating", "Automotive", "Flying", and "Train". The only real restrictions on this string though is that it's plain text and relatively short (one or two words).

This is also true for metadata with a type of "steps":

  • steps: An integer value recording the number of steps on the current day through the time of the entry.

Metadata with a "location" type is a little richer (latitude and longitude are the most important of these keys):

  • latitude: The latitude of the entry's location in decimal form.

  • longitude: The longitude of the entry's location in decimal form.

  • venue: A string representing the name of the entry's location ("15 Vine Street", "New York Public Library", etc.)

  • city: A string representing the city of the entry's location.

  • state: The two-character state/province abreviation of the entry's location.

  • country: A string holding the country name of the entry's location.

  • country_code: The ISO 3166-1 alpha-2 country code of the entry's location. This is currently only used as an internal convenience, as its presence can make it easier for input modules to determine an entry's timezone (if necessary).

The "music" metadata type is again a bit more minimal (artist and track are really the minimal sensible requirements here):

  • album: A string holding the album name of the music (if any) associated with the current entry.

  • artist: A string holding the artist of the music (if any) associated with the current entry.

  • track: A string holding the track name of the music (if any) associated with the current entry.

Finally, "weather" type metadata is even more diverse (a temperature and condition specification is probably the minimum sensible pair of keys):

  • celsius: An integer representing the temperature, in degrees Celsius, associated with the current entry.

  • fahrenheit: An integer representing the temperature, in degrees Celsius, associated with the current entry.

  • pressure_mb: A real number representing the atmospheric pressure, in millibars, associated with the current entry.

  • relative_humidity: An integer representing the relative humidity (as a percentage, though the percent symbol should not be included) associated with the current entry.

  • visibility_km: A real number representing the visibility in kilometers associated with the current entry.

  • wind_kph: A real number representing the wind speed, in kilometers per hour, associated with the current entry.

  • weather_icon or icon: A string holding the name of the "icon" representing the current weather. As different log output formats will probably require different icon names/formats, it is probably best to leave it up to the output modules to determine how to use this value (or whether to synthesize such an icon name themselves using the conditions string.)

  • wind_direction or wind_bearing: A string ("NW", "South South West", etc.) or integer compass bearing (clockwise, 0 degrees as North) representing the wind direction of originationn associated with the current entry.

  • conditions or weather_conditions or description: A short string (one or two words) describing the weather conditions ("Sunny", "Partly Cloudy", "Raining") associated with the current entry.

All keys should appear at most once per entry; all strings should be plain text (no HTML, textile, etc.) except for message and licensing, which should be Markdown (when in doubt, follow Day One's Markdown Guide, which is essentially a variant of GitHub Flavored Markdown).

As a matter of good form, modules should only deal with a single data source/output. So, for example the following modules are fine:

  • A module with only a self.read method that aggregates Facebook posts using API calls.

  • A module with only a self.write method that pushes lifelog data to a Google Spreadsheet.

  • A module with both self.read and self.write methods that reads and writes data from/to WordPress blogs.

However, the following modules should be avoided or split into multiple, separate components:

  • Modules that read from or write two multiple sources/outputs simultaneously.

  • Modules whose self.read and self.write methods use different data sources.

Example Use-Case

Currently, I use IFTTT to generate plain text files in more-or-less YAML format (though see the notes under Known Issues, below) in my Dropbox account. IFTTT doesn't support indentation or arbitrary timestamp formats, and the timestamps it does provide are not always helpful, but overall the system works well enough. But if you're wondering why YamlFiles has so many gyrations related to timestamps, timezones, and format conversions, now you know.

Day One is, of course, the primary way I store lifelog data once its been processed by lifelog.rb (actually, right now it's the only way I store this data).

In the future I intend to extend this script using more service-specific input modules, and will probably implement at least one additional output module (using either a database or Git as my back-end).

Known Issues

lifelog.rb is the first substantial script I've written in Ruby, and I'm sure it has problems. Here's the ones I currently know about:

  • The YamlFiles module attempts to circumvent parsing errors by forcing all data to be strings. But this is a pretty ham-fisted approach, and will likely fail in the event that key-value pairs are properly quoted or escaped! Error handling needs to be made much more robust here.

  • The DayOne module suffers from a couple of glaring deficiencies:

    • Hardware and OS versions are currently determined by making direct binary calls. This is not only not very Ruby-ish, but is also fragile and a potential security issue (though I'm sure lifelog.rb has other security problems as well...).

    • I have no idea what the LocationRegionCenterRadius key actually does or represents, and am currently just trying to generate a number that is more-or-less consistent with the values Day One seems to generate on its own. This is an obviously less-than-desirable approach.

License

This script is licensed under the GNU GPL v3. See the LICENSE file in this directory for the full license text.