Overview

HTTPS SSH

I - DESCRIPTION

PSP

1. Scraping

Can be done by running preprocessing/scrape_psp.sh 2017ps, or run with no arguments to see usage. This recursively downloads the PSP stenographic records, checking the existing files for timestamp & only downloading if newer files are present on server.

2. Parsing

Is started by running preprocessing/run_parsing.sh 2017ps, this parses the 2017ps directory in downloaded archives (created by the scraping step above) The result is written to file named like 2017ps_2018-06-26.jsonl, where the timestamp 2018-06-26 means the script was run on 26. June 2018. The order of rows in the file is deterministic, so the new & old files are meant to be diffed.

Senat

TODO

Parties

Scrape n parse

Is done by preprocessing/scrape_n_parse_parties.sh

The idea is to scrape https://www.psp.cz/sqw/organy2.sqw?k=1 to get a list of all members of parliament along with their party membership, participation in "vybor", etc.

Files

psp_list_of_mps.jsonl

Each line is a json dict of a member of a parliament.

{
    "jmeno": "V\u011bra Ad\u00e1mkov\u00e1",
    "plne_jmeno": "prof. MUDr.\u00a0V\u011bra\u00a0Ad\u00e1mkov\u00e1,\u00a0CSc.",
    "volebni_kraj": "Hlavn\u00ed m\u011bsto Praha",
    "klub": "ANO",
    "vybory": ["VZ"],
    "komise": [],
    "delegace": []
}

psp_sqw_index.json

Index from all values in previous file (such as "ANO", "VZ", "Hlavn\u00ed m\u011bsto Praha") to sqw index of the PSP database (on the psp.cz web). E.g. for:

    "volebni_kraj": "Olomouck\u00fd"

we have

    "Olomouck\u00fd": "592"

in the index file.

Each index value is unique and can be used for getting link with details on the original website, such as:

https://www.psp.cz/sqw/snem.sqw?id=592

which contains a list of all mps from olomoucky kraj.

3. Analysis

Stage 0

3a) People Stats

  • utterance_count - pocet unikatnich vystupu v parlamentu
  • mean_word_count_per_utterance - prumerny pocet slov na vystup
  • median_word_count_per_utterance - median tehoz
  • std_word_count_per_utterance - standard deviation tehoz
  • word_count - celkovy pocet slov
  • unique_word_count - celkovy unikatni pocet slov (bez hacku a carek, lowercase)
"role_predicates"
  • is_mp - poslanec / poslankyne
  • is_minister - ministr, clen vlady
  • is_senator - senator
  • is_czech_president - big MZ
  • is_moderator - (misto) predseda PSP, predsedajici
  • is_czech_cabinet_member - clen vlady (predseda vlady, mistopredseda vlady)
  • is_guest - ne poslanec, ne moderator; tzn kdyz mluvi jako predseda vlady (i kdyz je babis poslanec), tak je "host", stejne jako napr. "hejtman" stredoceskeho kraje

3b) Elasticsearch Keywords

Input: 2017ps-pos_tagged.jsonl Output: keywords.json Through Elasticsearch significant_text query over the "text" field in utterances. Steps: 1) import od utterances to db 2) query for export

4. API

Flask server

5. Frontend

Vue.js frontend

II - DEPLOYMENT & USAGE

Without docker (for development)

Prerequisities: python3, pip3, nodejs, npm, make; crond & whoknowswhat

II.Without dockers.1 - Install docker-server, docker-compose and elasticsearch

II.Without dockers.2 - Install dependencies manually

  • virtualenv -p /usr/bin/python3 /tmp/venv/
  • source /tmp/venv/bin/activate
  • cd ./api && pip3 install -r requirements.txt
  • cd ./preprocessor && pip3 install -r requirements.txt
  • cd ./frontend && npm install

II.Without dockers.3 - Start processor & app, have fun

  • git clone parlamenticon
  • cd parlamenticon
  • make
  • cd ./api && ./server.py
  • cd ./frontend && npm run dev

With docker

Prerequisities: docker-server, crond (plus some setting of nginx for test and production)

II.With Docker.A - local (for local testing of docker images and dockerfiles), frontend @ localhost:8080, API @ na 5000

  • cd parlamenticon
  • docker-compose build
  • docker-compose up processor
  • docker-compose up -d frontend && docker-compose up -d api

AMAZON (doc update in progress, partly obsolete)

II.With docker on amazon.B - test deplyment (for testing docker images and processes on parlamenticon), frontend @ localhost:9090 accessible from the internet, API @ na 8090

  • cd parlamenticon
  • optionally since we don't have a lot of space on instance: docker system prune
  • docker-compose build
  • docker-compose up test_processor
  • docker-compose up -d test_frontend && docker-compose up -d test_api
  • optionally: setup cron via cd parlamenticon && ./processor/utils/install_cron_script.sh test

II.With docker on amazon.C - production deployment (be careful), frontend @ localhost:8080, API @ na 5000

  • cd parlamenticon
  • ./config.sh production
  • optionally as we don't have a lot of space on instance: docker system prune
  • docker-compose build
  • docker-compose up production_processor
  • docker-compose up -d production_frontend && docker-compose up -d production_api
  • optionally: setup cron via cd parlamenticon && ./processor/utils/install_cron_script.sh production

II.With docker.X - Stopping services etc.

  • cd parlamenticon
  • docker-compose stop test_frontend
  • docker-compose build test_frontend
  • docker-compose up -d test_frontend

  • similarly for api, process/local, test, production

II.With docker.Y - Details about controllers

Look for info in - Makefile - Dockerfile - cron - viewing data in docker volumes: docker run -it -v parlamenticon_test_data:/data -v parlamenticon_test_dist:/dist busybox