gladson ======= A python package for retrieving and parsing product data and images using Gladson's_ Replication API. Requirements ------------ - You will need Python version 2.7 or 3.4+ installed - You will need API credentials provided by Gladson_ (a user name, password, token, and API key) To install: .. code-block:: python pip install gladson To follow and execute subsequent code examples, you will need to import the following: .. code-block:: python import os from datetime import timedelta, date from http.client import HTTPResponse from tempfile import gettempdir from time import gmtime from gladson.replication import Replication, ProductSearchResult, Asset The following are only needed for type hinting_ in supporting IDE's: .. code-block:: python from typing import Iterator Authenticate & Connect ---------------------- To connect to Gladson's Replication API, create an instance of the *Replication* class: .. code-block:: python replication = Replication( user='your_username', password='your_password', user_token='your_user_token', api_key='your_api_key' ) Product Search -------------- You will want to use *Replication.product.search()* to identify items which have information and/or images available based on a set of given parameters. In the example below, we request a list of all the products which have been updated no earlier than one week ago, and no later than yesterday: .. code-block:: python for result in replication.product.search( start_date=date.today() - timedelta(days=7), # Seven days ago end_date=date.today() - timedelta(days=1) # Yesterday ): # type: ProductSearchResult print(result) The parameters which can be passed to *Replication.product.search()* include: - **upc** (*int* or *str*): A universal product code (also known as a "global trade item number"). - **brand** (*str*): A brand name. - **on_hold** (*bool*): If *True*, search results include items which manufacturers have flagged as not currently on the market. Defaults to *False*. - **language** (*str*): A language code (or country-dependent language code) indicating the language in which to return attribute values. This parameter defaults to "US-ENG". - **description** (*str*): Keywords to search for in product descriptions. - **start_date** (*date*): All products which were last updated prior to this date will be excluded from returned results. Note: At the time this was written, the earliest *receipt date* in Gladson's database was 1999-11-01. - **end_date** (*date*): All products which were last updated after this date will be excluded from returned results. This parameter defaults to today's date. - **category** (*str*): If provided, only items with the specified category name are returned. Note: A few additional parameters can be passed for logging purposes. You can retrieve documentation concerning *all* of these parameters by printing the docstring_ for this method (as is true for all classes/functions addressed in this document): .. code-block:: python print(help(Replication.product.search)) *Replication.product.search()* returns a sequence of ProductSearchResult_ instances: =========================== ============================================================================================ Class **Properties** (*types*) =========================== ============================================================================================ _ProductSearchResult - **date_and_time_of_reception** (*date*): The date that the product was received and processed by Gladson. - **upc** (*str*): The product’s "Universal Product Code", in GTIN-14 format. - **modifier** (*str*): A single-digit text string distinguishing variations of an item which share the same UPC. - **brand** (str): The brand name of the product. - **categories** (Sequence[Category_]): A sequence of "categories"—names and identifiers for a set of products having similar attribution and taxonomy. - **language** (*str*): The language in which properties of this product are expressed. For example: "English". - **description** (*str*): A generic description of the product, followed by distinguishing variables. For example: "Beef Pattie Fritters, Country Fried Steak", "Ink Cartridge, Standard-Capacity, Cyan/Magenta/Yellow, T200520", or "Mineral Spirits, Odorless". - **assets** (Sequence[Asset_]): A sequence of Asset_ instances with information concerning media assets which are available for this product. =========================== ============================================================================================ Retrieving Product Details -------------------------- When you search for a product, the result is an iterator of ProductSearchResult_ instances, the results of which can be used to retrieve product *details* and/or *assets*. The parameters which can be passed to *Replication.product.details()* include: - **upc** (*str* or *int*): The UPC (also known as GTIN/EAN/ISBN) of an item. - **modifier** (*str*): A single-digit modifier (A-Z) identifying a variant when multiple distinct items share the same UPC. - **brand** (*str*): The brand name of the item. Note: Some items with the same UPC are branded differently in different regions, hence the need to be able to differentiate. In the following example, we search for products updated within the past week, then retrieve and print their product details: .. code-block:: python for result in replication.product.search( start_date=date.today() - timedelta(days=7), # Seven days ago end_date=date.today() - timedelta(days=1) # Yesterday ): # type: ProductSearchResult print(result) details = replication.product.details( result.upc, modifier=result.modifier ) # type: ProductDetails print(details) Interpreting Product Details ---------------------------- *Replication.product.details()* returns an instance of ProductDetails_, the content of which is described in the table below. =========================== ============================================================================================ Class **Properties** (*types*) =========================== ============================================================================================ _ProductDetails Identification: - **upc** (*str*): A 14- or 15-digit universal product code, also known as a GTIN. - **modifier** (*str*): A single-character string distinguishing variations of an item which share the same UPC. Classification: - **segment** (*str*): A high-level product classification. + "FOOD": Food and Beverages + "HBC": Health, Beauty & Cosmetics + "GM": General Merchandise - **categories** (Categories_): A container object for a sequence of Category_ instances denoting classification of the product within Gladson's taxonomy. General Information: - **source_label** (*str*): The source(s) from whom the product information was received, for example: "Gladson" or "Manufacturer". - **language** (*str*): The language in which properties of this product are expressed. For example: "English". - **description** (*str*): The *item_name* plus any product "attributes", separated by commas—for example: "Toothpaste, Fluoride, Clean Mint" or "Pasta Sauce, Garden Veggie". - **item_name** (*str*): The simplest, most basic description of the product—for example: "Coffee", "Cookies", or "Tea". - **directions** (*str*): Any instructions for use, as listed on the product. - **brand** (*str*): The most generalized name of the brand which can be found on the product/packaging. - **product_line** (*str*): A branding subset, also commonly referred to as a sub-brand. For example: in "Ford Focus", "Ford" would be the *brand* and "Focus" the *product_line*. - **variant** (*str*): A variable attribute of the product such as a flavor, fragrance, taste, or color. - **warnings** (*str*): Cautionary statements and warning as shown on the product's packaging. - **is_discontinued** (*bool*) - **assets** (Sequence[Asset_]): A sequence of Asset_ instances containing information about a media asset, and which can be used to retrieve the associated file. Company Information: - **manufacturer** (*str*): The name of the company which manufactured this product. - **phone** (*str*): The manufacturer phone number, as displayed on the product/packaging. - **address** (*str*): The address displayed on the box for the brand/manufacturer/distributor of the product. - **copyright** (*str*): Copyright information from the manufacturer as shown on the package. Dates: - **date_updated** (*date*): The date on which this product information was captured/posted/updated. This is the only date-property defined in the documentation most recently provided by Gladson at the time of writing, however it is sparsely populated. - **post_date** (*date*): (inferred) The date on which this product information was captured/posted/updated. - **postdate** (*date*): (inferred) The date on which this product information was captured/posted/updated. - **date_and_time_of_reception** (*date*): (inferred) The date on which this product information was received/updated. Note: This date is more frequently present for a product than the previous 3, however is often significantly later than the *date_updated*, *post_date*, or *postdate*. Drug Information - **drug_interactions** (*str*) - **indications** (*str*): Indicates what ailments a product (typically an over-the-counter medication) can/should be used to treat. Food and Beverage Information: - **has_nutrition** (*bool*): Indicates whether nutrition information is available for this product. - **ingredients** (*Ingredients*): A container object for a sequence of ingredients contained in the product. - **kosher1** (*int*): When present, this indicates that a symbol known to indicate a Kosher certification was found on the product's label. The number indicated correlates to a specific symbol/certification. - **kosher2** (*int*): See *kosher1*. - **kosher3** (*int*): See *kosher1*. - **kosher4** (*int*): See *kosher1*. - **kosher5** (*int*): See *kosher1*. - **nutrition_facts** (*NutritionFacts*): A container object holding one or more Variant_ instances, each of which contains a set of nutrition facts which vary depending on preparation state and/or serving size. - **value_prepared_count** (*int*): The number of different nutrition fact variants provided which represent a *prepared* variation of this product. Size: - **item_size** (*Decimal*): The number of units in the product expressed in terms of the *item_measure* or *uom*. - **item_measure** (*str*): The unit-of-measure in which the *item_size* is expressed. + "ea": Abbreviated from "each", this indicates that *item_size* is a unit quantity + "g": Gram + "cg": Centigram + "dg": Decigram + "hg": Hectogram + "kg": Kilogram + "lb": Pound + "mcg": Microgram + "mg": Milligram + "mt": Metric Ton + "oz": Ounce + "t": Ton + "cm": Centimeter + "dm": Decimeter + "ft": Foot + "in": Inch + "km": Kilometer + "m": Meter + "mm": Millimeter + "sf": Square Foot + "sm": Square Meter + "sy": Square Yard + "yd": Yard - **uom** (*str*): This is the same as the *item_measure* property, but is expressed in upper-case characters, and is set to *None* rather than "ea" if *item_size* is a unit quantity. - **extended_size** (*str*): A text description of the product's size, as shown on the package. For example: "3 desserts [9 oz (270 ml)]", "22 oz (1 lb 6 oz) 624 g", or "16.9 fl oz (500 ml)". Measurements: - **width** (*Decimal*): The width of the item (in-package), measured in inches. - **height** (*Decimal*): The height of the item (in-package), measured in inches. - **depth** (*Decimal*): The length of of the item (in-package), from front to back, measured in inches. - **product_weight** (*Decimal*): The weight of the product (in-package), measured in ounces. Etcetera: - **product_details** (*str*): Additional product information from the package which was not collected in any other fields. --------------------------- -------------------------------------------------------------------------------------------- _Asset - **asset_type** (*str*): + "image" + "video" + "document" + "other" - **asset_sub_type** (*str*): A designation further describing the content and/or it's intended application. For example: "Ecom" or "Pog". - **file_format** (*str*): The asset's file format/extension. For example: "Jpeg" or "Png". - **max_quality** (*str*): A string or integer describing the highest quality in which the asset is available. For images, this is an integer representing the number if pixels along the longest side of the image. For example: "1000" or "500". - **is_base** (*bool*): If *True*—this asset is delivered as a base64-encoded character string rather than raw bytes. --------------------------- -------------------------------------------------------------------------------------------- _Categories - **category** (Sequence[Category_]) --------------------------- -------------------------------------------------------------------------------------------- _Category - **name** (*str*) - **code** (*str*) --------------------------- -------------------------------------------------------------------------------------------- _Ingredients - **ingredient** (*Sequence[str]*): A list of ingredients. - **additional_ingredients** (AdditionalIngredients_): A container object for a list of "additional" ingredients. --------------------------- -------------------------------------------------------------------------------------------- _AdditionalIngredients - **additional_ingredient** (*str*): A list of "additional" ingredients. - **additional_ingredients_value** (*str*) --------------------------- -------------------------------------------------------------------------------------------- _NutritionFacts - **variant** (Sequence[Variant_]): A sequence of *Variant* instances, with each representing nutrition facts based on a particular serving size and preparation state. --------------------------- -------------------------------------------------------------------------------------------- _Variant - **serving_size_text** (*str*): A quantitative measure of the serving size (expressed as text). A complete description of the serving size can be assembled by concatenating this with the *serving_size_uom*. - **serving_size_uom** (*str*): The unit of measure in which the *serving_size_text* is expressed. - **servings_per_container** (str): A measure of how many servings are contained in the package (can be approximated), as described on the package. - **upc** (*str*): The product's 14- or 15-digit universal product code, also known as a "GTIN". - **nutrient** (Sequence[Nutrient_]): A sequence of objects representing information about individual nutrients. - **value_prepared_type** (*bool*): Do these values refer to a preparation which differs nutritionally from the unprepared state of the product? - **serving_size_in_grams** (*str*): The serving size, in grams, represented as text (typically including the measurement abbreviation "g" as well as the numeric measure). - **variant_value** (*str*) - **serving_size_prepared** (*str*) --------------------------- -------------------------------------------------------------------------------------------- _Nutrient - **is_or_contains** (*bool*): If *True*, this implies that the product either has or is the ingredient/nutrient/claim named. This is used with health claims (is) and allergens (contains). - **name** (*str*): The name of the nutrient, or if *is_or_contains* is *True*—the name of the ingredient or claim. - **percentage** (*Decimal*): The amount of this nutrient contained in a serving, expressed in terms of a percentage of the recommended daily consumption of this nutrient as dictated by the regionally relevant governing organization. - **quantity** (*Decimal*): The amount of this nutrient contained in a serving, expressed in terms of the measurement indicated by the *uom*. - **uom** (*str*): The measurement in which the *quantity* of this nutrient is expressed. - **value_prepared_type** (*bool*): Does this value refer to a preparation which differs nutritionally from the unprepared state of the product? =========================== ============================================================================================ Retrieving Assets ----------------- Either an instance of ProductSearchResult_ or ProductDetails_ will have all the information needed to retrieve a product's available assets using *Replication.product.asset()*. The parameters which can be passed to *Replication.product.asset()* include: - **upc** (*str* or *int*): The UPC (also known as GTIN/EAN/ISBN) of an item. - **modifier** (*str*): A single-digit modifier (A-Z) identifying a variant when multiple distinct items share the same UPC. - **brand** (*str*): The brand name of the item. - **on_hold** (*bool*): If *True*—images for products which have been flagged as not currently being on the market will be returned. Defaults to *False*. - **asset_type** (*str*): The type of asset: "image", "video", "document", or "other". - **asset_sub_type** (*str*): A designation further describing the content and/or it's intended application. For example: "Ecom" or "Pog". - **quality** (*str*): A string or integer describing the quality in which the asset is available. For images, this is an integer representing the number of pixels along the longest side of the image. For example: "1000" or "500". - **file_format** (*str*): The file format/extension of the asset. For example: "Jpeg" or "Png". In the following example, we search for all the items updated in the past seven days, download all of their assets which are identified as images, and display these images to you: .. code-block:: python for result in replication.product.search( start_date=date.today() - timedelta(days=7), # Seven days ago end_date=date.today() - timedelta(days=1) # Yesterday ): # type: ProductSearchResult for asset in result.assets: # type: Asset print(asset) if asset.asset_type == 'Image': response = replication.product.asset( upc=result.upc, modifier=result.modifier, brand=result.brand, asset_type=asset.asset_type, asset_sub_type=asset.asset_sub_type, file_format=asset.file_format, quality=int(asset.max_quality), on_hold=True ) # type: HTTPResponse path = '%(directory)s/%(upc)s%(modifier)s.%(extension)s' % dict( directory=gettempdir(), upc=result.upc, modifier=result.modifier or '', extension=asset.file_format.lower() ) with open(path, mode='wb') as f: f.write(response.read()) print(path) webbrowser.open('file://'+ path) .. _Gladson: http://www.gladson.com .. _Gladson's: http://www.gladson.com .. _type hinting: https://www.python.org/dev/peps/pep-0526 .. _docstring: https://www.python.org/dev/peps/pep-0257