Add metafeatures to feature extraction step

In addition to regular expression based feature extraction, we need to take features like length of a tweet. Check the blog post "Predicting HN upvotes using headlines" on https://www.dataquest.io/blog/predicting-upvotes/ . It has code like:

# Our list of functions to apply.
transform_functions = [
    lambda x: len(x),
    lambda x: x.count(" "),
    lambda x: x.count("."),
    lambda x: x.count("!"),
    lambda x: x.count("?"),
    lambda x: len(x) / (x.count(" ") + 1),
    lambda x: x.count(" ") / (x.count(".") + 1),
    lambda x: len(re.findall("\d", x)),
    lambda x: len(re.findall("[A-Z]", x)),
]

# Apply each function and put the results into a list.
columns = []
for func in transform_functions:
    columns.append(submissions["headline"].apply(func))

# Convert the meta features to a numpy array.
meta = numpy.asarray(columns).T

Comments (0)