Difference between revisions of "Workshop:Algolit"
Frankiezafe (Talk | contribs) (→visuels) |
Frankiezafe (Talk | contribs) (→resources) |
||
Line 1: | Line 1: | ||
[[File:Algolit-cover.jpg]] | [[File:Algolit-cover.jpg]] | ||
+ | |||
+ | == programme (provisional) == | ||
+ | |||
+ | === Monday === | ||
+ | Machine Learning & datasets | ||
+ | |||
+ | * 10:00-10:20: Introduction algolit http://www.algolit.net | ||
+ | ** References: https://pad.constantvzw.org/p/algolit-workshop-mons | ||
+ | * 10:20-10:30: Introduction Etherbox https://networksofonesown.constantvzw.org | ||
+ | * 10:30-11:00 Introduction AI / Machine learning | ||
+ | * 11:00-11:15 Break | ||
+ | * 11:15-11:45 Rulebased model: Markov Chain | ||
+ | * 11:45-12:15 Supervised machine learning & labeled datasets: Naive Bayes & sentiment analysis | ||
+ | * 12:15-13:00 Unsupervised machine learning & common crawl: word2vec | ||
+ | * 13-14u: Lunch | ||
+ | * 14-15u Guided tour Mundaneum in relation to dataset selection | ||
+ | * 15-17u | ||
+ | ** For non-coders: Oulipo exercise on datasets | ||
+ | ** For coders: Run the scripts per group, depending of level of programming | ||
+ | ** Get the code running, play / variate on input and output (visualisations) | ||
+ | ** Mundaneum dataset: what to do with it? | ||
+ | ** Labeled - nltk corpora | ||
+ | ** Unlabeled using gensim/tensorflow | ||
+ | ** CommonCrawl | ||
+ | ** ImageNet: | ||
+ | ** https://distill.pub/2017/feature-visualization/ | ||
+ | ** https://distill.pub/2018/building-blocks/ | ||
+ | * Rule-based | ||
+ | * 17-18u: presentation & sharing | ||
+ | |||
+ | === Tuesday === | ||
+ | Text processing | ||
+ | |||
+ | * 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans | ||
+ | * 10:30-11:50 Choose the type of element you want the machine to score and measure/count | ||
+ | * 10:30-10:50 Tokenization | ||
+ | * 11:10-11:30 Stemming | ||
+ | * 11:30-11:50 Part-of-speech | ||
+ | * 11:50-12:00 Break | ||
+ | * 12:00-13:00 algoliterary reading using the scripts of texts that match the action | ||
+ | * 13-14u: Lunch | ||
+ | * 14:00-17:00 Markov Chain: history, explanation, game, script | ||
+ | ** Play the game | ||
+ | ** Try another dataset, vary n-gram size | ||
+ | ** Read and understand the code, introduce start and end tokens. | ||
+ | ** Modify the algorithm, chain on stems or POS tags? | ||
+ | * 17:00-18:00 Presentation & sharing | ||
+ | |||
+ | === Wednesday === | ||
+ | From words to number / Pattern recognition - statistical techniques | ||
+ | |||
+ | * 10:00- 12:00 Counting / creating vectors | ||
+ | ** Transforming human readable text into numbers: information classifiers can process/calculate with. | ||
+ | * 10:00-10:30 frequency count | ||
+ | * 10:30-11:00 tf-idf | ||
+ | ** Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths | ||
+ | ** Demistifying vectors: show how they form vectors (of the same length) | ||
+ | * 11:00:11:15: Break | ||
+ | * 11:15-11:45 one-hot-vector / softmax | ||
+ | * 11:45-12:15 most common words (word2vec) | ||
+ | * 12:15-13:00: algoliterary reading using the scripts of texts that match the action | ||
+ | * 13-14u: Lunch | ||
+ | * 14:00-15:00 Naive Bayes: history, explanation, game, script | ||
+ | * 15:00-15:15: Break | ||
+ | * 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script | ||
+ | ** https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html | ||
+ | ** http://www.arxiv-sanity.com/ | ||
+ | ** http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ | ||
+ | ** http://colah.github.io/ | ||
+ | * 16:15-17:00: play the Naive Bayes game | ||
+ | * 17:00-18:00: Brainstorm 1 on projects, in collaboration | ||
+ | |||
+ | === Donderdag === | ||
+ | Evaluation methods | ||
+ | |||
+ | * 10:00-10:15: baseline | ||
+ | * 10:15-11:00: confusion matrix | ||
+ | * 11:00-11:15: Break | ||
+ | * 11:15-12:00: F-score, precision, recall | ||
+ | * 12:00-13:00 Recurrent Neural Networks - average loss | ||
+ | * 13-14u: Lunch | ||
+ | * 14:00-17:00: Questions & Brainstorm | ||
+ | * 17:00-18:00: presentation & sharing | ||
+ | |||
+ | === Vrijdag === | ||
+ | Prototypes ideas | ||
+ | |||
+ | * 10:00-13:00 presentation, brainstorm, in collaboration | ||
+ | * 13-14u: Lunch | ||
+ | * 14:00-17:00 finetuning & needs | ||
+ | * 17:00-18:00 final round | ||
== resources == | == resources == |
Revision as of 08:26, 8 October 2018
Contents
programme (provisional)
Monday
Machine Learning & datasets
- 10:00-10:20: Introduction algolit http://www.algolit.net
- 10:20-10:30: Introduction Etherbox https://networksofonesown.constantvzw.org
- 10:30-11:00 Introduction AI / Machine learning
- 11:00-11:15 Break
- 11:15-11:45 Rulebased model: Markov Chain
- 11:45-12:15 Supervised machine learning & labeled datasets: Naive Bayes & sentiment analysis
- 12:15-13:00 Unsupervised machine learning & common crawl: word2vec
- 13-14u: Lunch
- 14-15u Guided tour Mundaneum in relation to dataset selection
- 15-17u
- For non-coders: Oulipo exercise on datasets
- For coders: Run the scripts per group, depending of level of programming
- Get the code running, play / variate on input and output (visualisations)
- Mundaneum dataset: what to do with it?
- Labeled - nltk corpora
- Unlabeled using gensim/tensorflow
- CommonCrawl
- ImageNet:
- https://distill.pub/2017/feature-visualization/
- https://distill.pub/2018/building-blocks/
- Rule-based
- 17-18u: presentation & sharing
Tuesday
Text processing
- 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans
- 10:30-11:50 Choose the type of element you want the machine to score and measure/count
- 10:30-10:50 Tokenization
- 11:10-11:30 Stemming
- 11:30-11:50 Part-of-speech
- 11:50-12:00 Break
- 12:00-13:00 algoliterary reading using the scripts of texts that match the action
- 13-14u: Lunch
- 14:00-17:00 Markov Chain: history, explanation, game, script
- Play the game
- Try another dataset, vary n-gram size
- Read and understand the code, introduce start and end tokens.
- Modify the algorithm, chain on stems or POS tags?
- 17:00-18:00 Presentation & sharing
Wednesday
From words to number / Pattern recognition - statistical techniques
- 10:00- 12:00 Counting / creating vectors
- Transforming human readable text into numbers: information classifiers can process/calculate with.
- 10:00-10:30 frequency count
- 10:30-11:00 tf-idf
- Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths
- Demistifying vectors: show how they form vectors (of the same length)
- 11:00:11:15: Break
- 11:15-11:45 one-hot-vector / softmax
- 11:45-12:15 most common words (word2vec)
- 12:15-13:00: algoliterary reading using the scripts of texts that match the action
- 13-14u: Lunch
- 14:00-15:00 Naive Bayes: history, explanation, game, script
- 15:00-15:15: Break
- 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script
- 16:15-17:00: play the Naive Bayes game
- 17:00-18:00: Brainstorm 1 on projects, in collaboration
Donderdag
Evaluation methods
- 10:00-10:15: baseline
- 10:15-11:00: confusion matrix
- 11:00-11:15: Break
- 11:15-12:00: F-score, precision, recall
- 12:00-13:00 Recurrent Neural Networks - average loss
- 13-14u: Lunch
- 14:00-17:00: Questions & Brainstorm
- 17:00-18:00: presentation & sharing
Vrijdag
Prototypes ideas
- 10:00-13:00 presentation, brainstorm, in collaboration
- 13-14u: Lunch
- 14:00-17:00 finetuning & needs
- 17:00-18:00 final round
resources
visuels
See http://t2i.cvalenzuelab.com - article about the process: http://aiweirdness.com/post/177091486527/this-ai-is-bad-at-drawing-but-will-try-anyways
libraries
- Tensorflow: https://www.tensorflow.org
- GloVe: https://nlp.stanford.edu/projects/glove