Jump to: navigation, search

Difference between revisions of "Workshop:Algolit"

(visuels)
(resources)
Line 1: Line 1:
 
[[File:Algolit-cover.jpg]]
 
[[File:Algolit-cover.jpg]]
 +
 +
== programme (provisional) ==
 +
 +
=== Monday ===
 +
Machine Learning & datasets
 +
 +
* 10:00-10:20: Introduction algolit http://www.algolit.net
 +
** References: https://pad.constantvzw.org/p/algolit-workshop-mons
 +
* 10:20-10:30: Introduction Etherbox https://networksofonesown.constantvzw.org
 +
* 10:30-11:00 Introduction AI / Machine learning
 +
* 11:00-11:15 Break
 +
* 11:15-11:45 Rulebased model: Markov Chain
 +
* 11:45-12:15 Supervised machine learning & labeled datasets: Naive Bayes &  sentiment analysis
 +
* 12:15-13:00 Unsupervised machine learning & common crawl: word2vec
 +
* 13-14u: Lunch
 +
* 14-15u Guided tour Mundaneum in relation to dataset selection
 +
* 15-17u
 +
** For non-coders: Oulipo exercise on datasets
 +
** For coders: Run the scripts per group, depending of level of programming
 +
** Get the code running, play / variate on input and output (visualisations)
 +
** Mundaneum dataset: what to do with it?
 +
** Labeled - nltk corpora
 +
** Unlabeled using gensim/tensorflow
 +
** CommonCrawl
 +
** ImageNet:
 +
** https://distill.pub/2017/feature-visualization/
 +
** https://distill.pub/2018/building-blocks/
 +
* Rule-based
 +
* 17-18u: presentation & sharing
 +
 +
=== Tuesday ===
 +
Text processing
 +
 +
* 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans
 +
* 10:30-11:50 Choose the type of element you want the machine to score and measure/count
 +
* 10:30-10:50 Tokenization
 +
* 11:10-11:30 Stemming
 +
* 11:30-11:50 Part-of-speech
 +
* 11:50-12:00 Break
 +
* 12:00-13:00 algoliterary reading using the scripts of texts that match the action
 +
* 13-14u: Lunch
 +
* 14:00-17:00 Markov Chain: history, explanation, game, script
 +
** Play the game
 +
** Try another dataset, vary n-gram size
 +
** Read and understand the code, introduce start and end tokens.
 +
** Modify the algorithm, chain on stems or POS tags?
 +
* 17:00-18:00 Presentation & sharing
 +
 +
=== Wednesday ===
 +
From words to number / Pattern recognition - statistical techniques
 +
 +
* 10:00- 12:00 Counting / creating vectors
 +
** Transforming human readable text into numbers: information classifiers can process/calculate with.
 +
* 10:00-10:30 frequency count
 +
* 10:30-11:00 tf-idf
 +
** Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths
 +
** Demistifying vectors: show how they form vectors (of the same length)
 +
* 11:00:11:15: Break
 +
* 11:15-11:45 one-hot-vector / softmax
 +
* 11:45-12:15 most common words (word2vec)
 +
* 12:15-13:00: algoliterary reading using the scripts of texts that match the action
 +
* 13-14u: Lunch
 +
* 14:00-15:00 Naive Bayes: history, explanation, game, script
 +
* 15:00-15:15: Break
 +
* 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script
 +
** https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
 +
** http://www.arxiv-sanity.com/
 +
** http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
 +
** http://colah.github.io/
 +
* 16:15-17:00: play the Naive Bayes game
 +
* 17:00-18:00: Brainstorm 1 on projects, in collaboration
 +
 +
=== Donderdag ===
 +
Evaluation methods
 +
 +
* 10:00-10:15: baseline
 +
* 10:15-11:00: confusion matrix
 +
* 11:00-11:15: Break
 +
* 11:15-12:00: F-score, precision, recall
 +
* 12:00-13:00 Recurrent Neural Networks - average loss
 +
* 13-14u: Lunch
 +
* 14:00-17:00: Questions & Brainstorm
 +
* 17:00-18:00: presentation & sharing
 +
 +
=== Vrijdag ===
 +
Prototypes ideas
 +
 +
* 10:00-13:00 presentation, brainstorm, in collaboration
 +
* 13-14u: Lunch
 +
* 14:00-17:00 finetuning & needs
 +
* 17:00-18:00 final round
  
 
== resources ==
 
== resources ==

Revision as of 08:26, 8 October 2018

Algolit-cover.jpg

programme (provisional)

Monday

Machine Learning & datasets

Tuesday

Text processing

  • 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans
  • 10:30-11:50 Choose the type of element you want the machine to score and measure/count
  • 10:30-10:50 Tokenization
  • 11:10-11:30 Stemming
  • 11:30-11:50 Part-of-speech
  • 11:50-12:00 Break
  • 12:00-13:00 algoliterary reading using the scripts of texts that match the action
  • 13-14u: Lunch
  • 14:00-17:00 Markov Chain: history, explanation, game, script
    • Play the game
    • Try another dataset, vary n-gram size
    • Read and understand the code, introduce start and end tokens.
    • Modify the algorithm, chain on stems or POS tags?
  • 17:00-18:00 Presentation & sharing

Wednesday

From words to number / Pattern recognition - statistical techniques

  • 10:00- 12:00 Counting / creating vectors
    • Transforming human readable text into numbers: information classifiers can process/calculate with.
  • 10:00-10:30 frequency count
  • 10:30-11:00 tf-idf
    • Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths
    • Demistifying vectors: show how they form vectors (of the same length)
  • 11:00:11:15: Break
  • 11:15-11:45 one-hot-vector / softmax
  • 11:45-12:15 most common words (word2vec)
  • 12:15-13:00: algoliterary reading using the scripts of texts that match the action
  • 13-14u: Lunch
  • 14:00-15:00 Naive Bayes: history, explanation, game, script
  • 15:00-15:15: Break
  • 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script
  • 16:15-17:00: play the Naive Bayes game
  • 17:00-18:00: Brainstorm 1 on projects, in collaboration

Donderdag

Evaluation methods

  • 10:00-10:15: baseline
  • 10:15-11:00: confusion matrix
  • 11:00-11:15: Break
  • 11:15-12:00: F-score, precision, recall
  • 12:00-13:00 Recurrent Neural Networks - average loss
  • 13-14u: Lunch
  • 14:00-17:00: Questions & Brainstorm
  • 17:00-18:00: presentation & sharing

Vrijdag

Prototypes ideas

  • 10:00-13:00 presentation, brainstorm, in collaboration
  • 13-14u: Lunch
  • 14:00-17:00 finetuning & needs
  • 17:00-18:00 final round

resources

visuels

VOID NOISE IS FULL OF WORDS.jpg

NOISE IS FULL OF WORDS


Pretty unicorn with flowers on.png

See http://t2i.cvalenzuelab.com - article about the process: http://aiweirdness.com/post/177091486527/this-ai-is-bad-at-drawing-but-will-try-anyways

80 Million Tiny Images.png

Visual Dictionary - Click on top of the map to visualize the images in that region of the visual dictionary.

libraries