This is my Master Thesis project submitted in partial fulfillment of the requirements for the degree of Master of Sciences in Communication and Information Sciences, Master Track Human Aspects of Information Technology, at the faculty of humanities of Tilburg University.

It is a source code component retrieval application and it can retrieve Java methods-signatures from the Java Standard Library given an English query.

It works in a fairly unorthodox way: retrieves methods using bag-of-words translation: The translation model is a Ridge Regression model trained on the term-document matrices of the two parallel document collections: Java method-signatures + Descriptions


  • Required packeges: Gensim, Scikit-learn and argparse and climate
  • First run to create a model
  • Run the search engine by running

  • can be used to create tf*idf vectors from texts

  • contains methods to use Gensim's search interface
  • trains the regression model