Jump to: navigation, search

Difference between revisions of "Workshop:Algolit"

(resources)
(resources)
Line 93: Line 93:
  
 
== resources ==
 
== resources ==
 +
=== For coders ===
 +
 +
With a proper python install a single line might / should work:
 +
$ pip3 install nltk gensim scikit-learn matplotlib tensorflow pattern3 setuptools
 +
 +
in case pip3 does not work:
 +
$ sudo apt-get install python3-pip
 +
 +
'''Requirements'''
 +
 +
* Python 3.x (comes with pip): https://www.python.org/downloads/release/python-370/
 +
* Tensorflow: https://www.tensorflow.org/install/pip
 +
* NLTK with all corpora
 +
*# Install NLTK
 +
*#* instructions for Mac/Unix + Windows: https://www.nltk.org/install.html
 +
*#* Mac / unix: should work with: (sudo ?) pip install nltk
 +
*#* Windows: perhaps try with pip, otherwise https://pypi.org/project/nltk/ )
 +
*# Install NLTK data
 +
*#* (through python shell): https://www.nltk.org/data.html
 +
*#* Pattern for Python: https://pypi.org/project/pattern3/
 +
*#* https://www.clips.uantwerpen.be/pattern
 +
* Gensim
 +
** https://radimrehurek.com/gensim/install.html
 +
** (sudo ?) pip install --upgrade gensim
 +
* Scipy: https://www.scipy.org/install.html
 +
* Scikit Learn: http://scikit-learn.org/stable/install.html (install Scipy first!)
 +
*matplotlib: https://matplotlib.org/users/installing.html
 +
* virtual environment:
 +
** Ubuntu: https://www.linode.com/docs/development/python/create-a-python-virtualenv-on-ubuntu-1610/
 +
 +
Git-repository:  https://gitlab.constantvzw.org/algolit/mundaneum
 +
 +
=== links ===
  
 
* [https://pad.constantvzw.org/p/algolit-workshop-mons pad du workshop]
 
* [https://pad.constantvzw.org/p/algolit-workshop-mons pad du workshop]

Revision as of 08:32, 8 October 2018

Algolit-cover.jpg

programme (provisional)

Monday

Machine Learning & datasets

Tuesday

Text processing

  • 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans
  • 10:30-11:50 Choose the type of element you want the machine to score and measure/count
  • 10:30-10:50 Tokenization
  • 11:10-11:30 Stemming
  • 11:30-11:50 Part-of-speech
  • 11:50-12:00 Break
  • 12:00-13:00 algoliterary reading using the scripts of texts that match the action
  • 13-14u: Lunch
  • 14:00-17:00 Markov Chain: history, explanation, game, script
    • Play the game
    • Try another dataset, vary n-gram size
    • Read and understand the code, introduce start and end tokens.
    • Modify the algorithm, chain on stems or POS tags?
  • 17:00-18:00 Presentation & sharing

Wednesday

From words to number / Pattern recognition - statistical techniques

  • 10:00- 12:00 Counting / creating vectors
    • Transforming human readable text into numbers: information classifiers can process/calculate with.
  • 10:00-10:30 frequency count
  • 10:30-11:00 tf-idf
    • Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths
    • Demistifying vectors: show how they form vectors (of the same length)
  • 11:00:11:15: Break
  • 11:15-11:45 one-hot-vector / softmax
  • 11:45-12:15 most common words (word2vec)
  • 12:15-13:00: algoliterary reading using the scripts of texts that match the action
  • 13-14u: Lunch
  • 14:00-15:00 Naive Bayes: history, explanation, game, script
  • 15:00-15:15: Break
  • 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script
  • 16:15-17:00: play the Naive Bayes game
  • 17:00-18:00: Brainstorm 1 on projects, in collaboration

Donderdag

Evaluation methods

  • 10:00-10:15: baseline
  • 10:15-11:00: confusion matrix
  • 11:00-11:15: Break
  • 11:15-12:00: F-score, precision, recall
  • 12:00-13:00 Recurrent Neural Networks - average loss
  • 13-14u: Lunch
  • 14:00-17:00: Questions & Brainstorm
  • 17:00-18:00: presentation & sharing

Vrijdag

Prototypes ideas

  • 10:00-13:00 presentation, brainstorm, in collaboration
  • 13-14u: Lunch
  • 14:00-17:00 finetuning & needs
  • 17:00-18:00 final round

resources

For coders

With a proper python install a single line might / should work:

$ pip3 install nltk gensim scikit-learn matplotlib tensorflow pattern3 setuptools

in case pip3 does not work:

$ sudo apt-get install python3-pip

Requirements

Git-repository: https://gitlab.constantvzw.org/algolit/mundaneum

links

visuels

VOID NOISE IS FULL OF WORDS.jpg

NOISE IS FULL OF WORDS


Pretty unicorn with flowers on.png

See http://t2i.cvalenzuelab.com - article about the process: http://aiweirdness.com/post/177091486527/this-ai-is-bad-at-drawing-but-will-try-anyways

80 Million Tiny Images.png

Visual Dictionary - Click on top of the map to visualize the images in that region of the visual dictionary.

libraries