View source
change destination
  • Contributors
    1. Loading...
Author Commit Message Date Builds
17 commits behind dev_kinan.
test: excerpt from cooc-matrix with the playlist in test set training: pd.Series, index = pid, entry=name
more comments
binary search for a weight is stopped if no improvement occurs after a step
contains column number of playlists and playlist name for trainings set
added initial variance treshold
variance of evaluation results of the 10 subsets are considered
Merge branch 'dev_selin2' of into dev_selin
incomplete CV-procedure
ignore - trying to repair this branch
incomplete cross-validation procedure
added some comments
training set can now be generated (outputs a csv file with format: pid, name, [tracks])
creating a training_set
Kinan Halloum
get_score is now much more efficient, in addition the similar playlist names are cached for more efficiency (most similar playlists generation for a given playlist name takes ~100ms. Similarity of track to a cached playlist names ~1ms)
name_sim*.py files determine how well a track fits to a playlist name
removed some dead code parts
Code I used for generating the 7k recommendations
no message
Cleaned the code a bit
Code is not cleaned yet, but the tracks for the playlists with zero tracks are generated (first 1000 playlists out of 1000000)
Changed the analyzer of the TfidfVectorizer to n-grams to compare character n-grams and not words. Currently trying to find out how to handle emojis.
tidying up my branch
In the previous commit it was only possible to use existing playlist names and not newly invented.
- Creating a list of playlist names reusing Kinan's code (approx. 30 min) - Computing the similarity of a playlist name to all other playlist names (in 10s :D) - Returns dataframe with pids of most similar playlist names