Chris Grubbs avatar Chris Grubbs committed f446239

naive first stab at differentiating ingredients from instructions in sk output

Comments (0)

Files changed (2)

phlombay/analyzer/analyzer.py

+import json
+import nltk
+import sys
+
+
+def ingredient_special_cases(sentence):
+    if '(optional' in set(sentence.lower().split()):
+        return True
+    return False
+
+
+if __name__ == '__main__':
+    try:
+        recipes = json.loads(open('smittenkitchen.json', 'r').read())
+    except IOError:
+        sys.exit('Recipe file not found.')
+    
+    for recipe in recipes:
+        if len(recipe['content'][0]):
+            for sentence in recipe['content'][0].split('\n'):
+                text = nltk.word_tokenize(sentence)
+                tagged_sentence = nltk.pos_tag(text)
+                if tagged_sentence[0][1] in {'CD', 'LS'} or ingredient_special_cases(sentence):
+                    print u'This is an ingredient: {}'.format(sentence)
+                else:
+                    print u'This is not an ingredient: {}'.format(sentence)
-Django==1.4
+Django==1.4
+nltk
+numpy
+Scrapy==0.14.4
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.