Jump to: navigation, search

Difference between revisions of "Workshop:Algolit"

(visuels)
(programme (provisional))
Line 1: Line 1:
 
[[File:Algolit-cover.jpg]]
 
[[File:Algolit-cover.jpg]]
 +
 +
== projets ==
 +
 +
=== Florian ===
 +
 +
=== Brandon ===
 +
 +
=== Louise ===
 +
 +
=== Loris ===
 +
 +
=== Laetitia ===
 +
 +
=== Francois ===
 +
 +
Le projet sdkfnosdf hv
 +
 +
Kjnfoljkln df;bjkd;f
 +
 +
[http://artsaucarre.be le site de l'école]
  
 
== programme (provisional) ==
 
== programme (provisional) ==
Line 22: Line 42:
 
** Mundaneum dataset: what to do with it?
 
** Mundaneum dataset: what to do with it?
 
** Labeled - nltk corpora
 
** Labeled - nltk corpora
** Unlabeled using gensim/tensorflow
+
** Unlabeled using ge
** CommonCrawl
+
** ImageNet:
+
** https://distill.pub/2017/feature-visualization/
+
** https://distill.pub/2018/building-blocks/
+
* Rule-based
+
* 17-18u: presentation & sharing
+
 
+
=== Tuesday ===
+
Text processing
+
 
+
* 10:00-10:30: What is in a good/crappy dataset, crawl wikipedia, nltk corpora/import twitter, gutenberg, mundaneum scans
+
* 10:30-11:50 Choose the type of element you want the machine to score and measure/count
+
* 10:30-10:50 Tokenization
+
* 11:10-11:30 Stemming
+
* 11:30-11:50 Part-of-speech
+
* 11:50-12:00 Break
+
* 12:00-13:00 algoliterary reading using the scripts of texts that match the action
+
* 13-14u: Lunch
+
* 14:00-17:00 Markov Chain: history, explanation, game, script
+
** Play the game
+
** Try another dataset, vary n-gram size
+
** Read and understand the code, introduce start and end tokens.
+
** Modify the algorithm, chain on stems or POS tags?
+
* 17:00-18:00 Presentation & sharing
+
 
+
=== Wednesday ===
+
From words to number / Pattern recognition - statistical techniques
+
 
+
* 10:00- 12:00 Counting / creating vectors
+
** Transforming human readable text into numbers: information classifiers can process/calculate with.
+
* 10:00-10:30 frequency count
+
* 10:30-11:00 tf-idf
+
** Normalize your numbers to be able to compare documents / paragraphs / sentences of different lengths
+
** Demistifying vectors: show how they form vectors (of the same length)
+
* 11:00:11:15: Break
+
* 11:15-11:45 one-hot-vector / softmax
+
* 11:45-12:15 most common words (word2vec)
+
* 12:15-13:00: algoliterary reading using the scripts of texts that match the action
+
* 13-14u: Lunch
+
* 14:00-15:00 Naive Bayes: history, explanation, game, script
+
* 15:00-15:15: Break
+
* 15:15-16:15 Linear Regression OR Perceptron: history, explanation with videos, script
+
** https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
+
** http://www.arxiv-sanity.com/
+
** http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
+
** http://colah.github.io/
+
* 16:15-17:00: play the Naive Bayes game
+
* 17:00-18:00: Brainstorm 1 on projects, in collaboration
+
 
+
=== Donderdag ===
+
Evaluation methods
+
 
+
* 10:00-10:15: baseline
+
* 10:15-11:00: confusion matrix
+
* 11:00-11:15: Break
+
* 11:15-12:00: F-score, precision, recall
+
* 12:00-13:00 Recurrent Neural Networks - average loss
+
* 13-14u: Lunch
+
* 14:00-17:00: Questions & Brainstorm
+
* 17:00-18:00: presentation & sharing
+
 
+
=== Vrijdag ===
+
Prototypes ideas
+
 
+
* 10:00-13:00 presentation, brainstorm, in collaboration
+
* 13-14u: Lunch
+
* 14:00-17:00 finetuning & needs
+
* 17:00-18:00 final round
+
  
 
== resources ==
 
== resources ==

Revision as of 08:56, 23 October 2018

Algolit-cover.jpg

projets

Florian

Brandon

Louise

Loris

Laetitia

Francois

Le projet sdkfnosdf hv

Kjnfoljkln df;bjkd;f

le site de l'école

programme (provisional)

Monday

Machine Learning & datasets

  • 10:00-10:20: Introduction algolit http://www.algolit.net
  • 10:20-10:30: Introduction Etherbox https://networksofonesown.constantvzw.org
  • 10:30-11:00 Introduction AI / Machine learning
  • 11:00-11:15 Break
  • 11:15-11:45 Rulebased model: Markov Chain
  • 11:45-12:15 Supervised machine learning & labeled datasets: Naive Bayes & sentiment analysis
  • 12:15-13:00 Unsupervised machine learning & common crawl: word2vec
  • 13-14u: Lunch
  • 14-15u Guided tour Mundaneum in relation to dataset selection
  • 15-17u
    • For non-coders: Oulipo exercise on datasets
    • For coders: Run the scripts per group, depending of level of programming
    • Get the code running, play / variate on input and output (visualisations)
    • Mundaneum dataset: what to do with it?
    • Labeled - nltk corpora
    • Unlabeled using ge

resources

For coders

With a proper python install a single line might / should work:

$ pip3 install nltk gensim scikit-learn matplotlib tensorflow pattern3 setuptools

in case pip3 does not work:

$ sudo apt-get install python3-pip

Requirements

Git-repository: https://gitlab.constantvzw.org/algolit/mundaneum

links

online

Bot-or-not.png


Deep-flow.png


visuels

VOID NOISE IS FULL OF WORDS.jpg

NOISE IS FULL OF WORDS


Pretty unicorn with flowers on.png

See http://t2i.cvalenzuelab.com - article about the process: http://aiweirdness.com/post/177091486527/this-ai-is-bad-at-drawing-but-will-try-anyways

80 Million Tiny Images.png

Visual Dictionary - Click on top of the map to visualize the images in that region of the visual dictionary.

libraries