Jump to: navigation, search

Difference between revisions of "Workshop:Algolit"

(resources)
(visuels)
Line 131: Line 131:
 
* [http://www.algolit.net/index.php/Mundaneum_workshop Annonce du workshop]
 
* [http://www.algolit.net/index.php/Mundaneum_workshop Annonce du workshop]
 
* [https://gitlab.com/arts2artsnumeriques/toolbox/tree/master/python cours et notes de python]
 
* [https://gitlab.com/arts2artsnumeriques/toolbox/tree/master/python cours et notes de python]
 +
 +
=== online ===
 +
 +
[[File:Bot-or-not.png|250px]]
 +
 +
* http://botpoet.com
 +
 +
 +
[[File:Deep-flow.png|250px]]
 +
 +
* https://deep-flow.nl
 +
 +
  
 
=== visuels ===
 
=== visuels ===

Revision as of 09:09, 8 October 2018

Algolit-cover.jpg

programme (provisional)

Monday

Machine Learning & datasets

Tuesday

Text processing

  • 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans
  • 10:30-11:50 Choose the type of element you want the machine to score and measure/count
  • 10:30-10:50 Tokenization
  • 11:10-11:30 Stemming
  • 11:30-11:50 Part-of-speech
  • 11:50-12:00 Break
  • 12:00-13:00 algoliterary reading using the scripts of texts that match the action
  • 13-14u: Lunch
  • 14:00-17:00 Markov Chain: history, explanation, game, script
    • Play the game
    • Try another dataset, vary n-gram size
    • Read and understand the code, introduce start and end tokens.
    • Modify the algorithm, chain on stems or POS tags?
  • 17:00-18:00 Presentation & sharing

Wednesday

From words to number / Pattern recognition - statistical techniques

  • 10:00- 12:00 Counting / creating vectors
    • Transforming human readable text into numbers: information classifiers can process/calculate with.
  • 10:00-10:30 frequency count
  • 10:30-11:00 tf-idf
    • Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths
    • Demistifying vectors: show how they form vectors (of the same length)
  • 11:00:11:15: Break
  • 11:15-11:45 one-hot-vector / softmax
  • 11:45-12:15 most common words (word2vec)
  • 12:15-13:00: algoliterary reading using the scripts of texts that match the action
  • 13-14u: Lunch
  • 14:00-15:00 Naive Bayes: history, explanation, game, script
  • 15:00-15:15: Break
  • 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script
  • 16:15-17:00: play the Naive Bayes game
  • 17:00-18:00: Brainstorm 1 on projects, in collaboration

Donderdag

Evaluation methods

  • 10:00-10:15: baseline
  • 10:15-11:00: confusion matrix
  • 11:00-11:15: Break
  • 11:15-12:00: F-score, precision, recall
  • 12:00-13:00 Recurrent Neural Networks - average loss
  • 13-14u: Lunch
  • 14:00-17:00: Questions & Brainstorm
  • 17:00-18:00: presentation & sharing

Vrijdag

Prototypes ideas

  • 10:00-13:00 presentation, brainstorm, in collaboration
  • 13-14u: Lunch
  • 14:00-17:00 finetuning & needs
  • 17:00-18:00 final round

resources

For coders

With a proper python install a single line might / should work:

$ pip3 install nltk gensim scikit-learn matplotlib tensorflow pattern3 setuptools

in case pip3 does not work:

$ sudo apt-get install python3-pip

Requirements

Git-repository: https://gitlab.constantvzw.org/algolit/mundaneum

links

online

Bot-or-not.png


Deep-flow.png


visuels

VOID NOISE IS FULL OF WORDS.jpg

NOISE IS FULL OF WORDS


Pretty unicorn with flowers on.png

See http://t2i.cvalenzuelab.com - article about the process: http://aiweirdness.com/post/177091486527/this-ai-is-bad-at-drawing-but-will-try-anyways

80 Million Tiny Images.png

Visual Dictionary - Click on top of the map to visualize the images in that region of the visual dictionary.

libraries