oriented toward understanding and mathematics
rather than impressive demonstration
– I will try to answer to some of your questions –
2024-05-13
An example of mine from 25 years ago.
A simple perceptron-type neural-network to assign amino-acid signals from protein 1H NMR signals.
- input layer : 32
- hidden layer: 4
- output layer: 20
- software: MatLab module
- training time : long !
Well known example
Analysis of a system, from measurements to model
DATA
can be anything
- images
- distributions
- values
- classification
\(N\) measurements
\(X_n \quad n : \{1 \cdots N\}\)
each \(X_n\) contains
1 or more “features”
\(\Rightarrow\) stored in a matrix \(X\)
NOT in Excel !
MODEL
can be anything
- analogic
- equation
- program
- neural network
\(P\) parameters
\(M_p \quad p : \{1 \cdots P\}\)
\(\Rightarrow\) stored in a program
NOT in Excel !
Question
can be many things
- regression
- classification
- clustering
- model confirmation
- inversion
- denoising
- generative
“NOT in Excel” could be a definition of modern ML !
\(\Rightarrow\) trained model
to do so:
build a target function \(T\) which measure the mismatch (or a similarity)
1/ some answers \(A\) are known - supervised training \(\;\equiv\quad\) knowledge extension
2/ No known answers - non-supervised training \(\;\equiv\quad\) knowledge extraction
1/ supervised training
some answers \(A\) are known - build \(T\) using the answers :
\[T(M) = d( M(X), A )\qquad \text{where } d \text{ is a distance}\]
2/ non-supervised training
No known answers - build \(T\) with other kind information :
3/ cumulate both approaches \[T(M) = d( M(X), A ) + \alpha f(M)\] Regularisation approach – quite common actually
In all cases, find the set of parameters \(M_p\) so that \(T(M)\) is optimal
minimum if \(d()\) is a distance; – maximum if \(d()\) is a similarity
The distance can be the Cartesian distance: \[d(a,b) = \sqrt{ \sum_i (a_i - b_i)^2 }\] (also called the \(\ell_2\) norm)
but can be any other norm, (or even a pseudo-norm) in particular for large dimensionnal datasets.
for instance
We are dealing with a HUGE search space, so we need to compute the derivative of \(T\) against \(M_p\):
\[ \nabla M = \frac{\partial T}{\partial M_p}\]
which is a vector of dimension \(P\), depending on the form of \(M\) it can a be very complex function.
\(\Rightarrow\) automatic differentiation comes to the rescue.
then, a single step is taken in the downward direction: \[ M_{n+1} = M_n + \gamma \nabla M (M_n)\] - steps (often called epoch) are iterated until convergence.
- \(\gamma\) (the learning rate) is a small number which insures the convergence
Computing the whole vector \(\nabla M\) is usually too large,
\(\Rightarrow\) we restrict to a randomly selected subspace of \(M\) called a mini-batch
Many improvements are possible :
With SGD, the convergence is efficient, however it is not monotonous
source Wikipedia
The target function value draws a surface depending on the parameters \(\quad D(P_m)/Y_n^{mes}\)
which can be convex or not
convex / non-convex, here in 2 dimensions
\(L_1\) et \(L_2\) are convex, but problem convexity depends also on the model.
a lot of data!
A Dataset is a set of points in a multidimensionnal space.
The number of dimension \(N\) can be large !
tabulated datasets
\(\Rightarrow\) statistical approaches well adapted
PCA / LDA — SVM — Random Forests — …
Not trendy but very efficient !
non-structurated datasets
\(\Rightarrow\) Deep Learning approaches well adapted
Deep Neural Networks (DNN)
Very trendy ! \(\qquad\) AI 😢
scikit-learn.org/stable/tutorial/machine_learning_map/index.html
based open-source tools in python
numpy
: tools for general mathematical and array handlingscipy
: advanced mathematical and statistical toolsmatplotlib
: generic ploting libraryPandas
: Tabulated data handlingAlso Deep Neural Networks or DNN
Pytorch
: originally created by academic, developed by Facebook, released in 2016Tensorflow
: developed by Google, released in 2015Typically, each neurons has its own set of \(W_i\) for each inputs, and eventually parameters for \(f()\)
For a layer of \(K\) neurons connected to an input layer of \(L\) neurons, the number of parameters is proportionnal to \(K \times L\)
what is deep ?
In my first slide, we had a 3 layers NN \(\equiv\) perceptron
it was the maximum before automatic derivation !
\(\Rightarrow\) probably \(\approx\) 1000 parameters
Modern DNN have dozens / hundred of layers, each composed of thousand of neurons.
Example of an Image classifier DNN
source Wikipedia
Non Convex ! \(\Rightarrow\) Algo: SGD: Stochastic Gradient Descent
Many parameters \(\Rightarrow\) training on a large series of examples
BUT it is a perfect Black-Box
s:structures, r:residues, c:channels, x:coordinates
Is the key element !
nice results often hide awfull artefacts
a data model
Everything You Always Wanted to Know About Sex
(But Were Afraid to Ask)
Everything You Always Wanted to Know About ML
(But Were Afraid to Ask)
type the first word, then simply click systematicaly on the words proposed by your phone
on my Android phone…
Hello,sorry for the late reply and delete the message and any attachments are confidential and intended solely for the addressees to whom they are to the intended recipient only and is now totaly bricked and does not even react to the on the other hand I can miss the talks on the afternoon of the recipients
Would you please post this announcement of an upcoming virtual NMR conference for the message if altered commentaires de l’ANR aimerait des détails sur les frais de consommables
Can I ask you for the addressees for my jsme commentaires commentaires du moulin and commentaires de Laura
Interestingly, the program mixes different languages I use on my phone (English, French and even Czech – “jsme” in the last poem, prompted by “my” : “my jsme” \(\Rightarrow\) “we are” in Czech)
Hidden Markov Process
We are impressed by a seemingly inteligent machine which produces meaningfull texts. It seems that the machine is intellignent and understands what it says, as it is meaningfull.
\(\Rightarrow\) there is a discontinuity in the process , Emergence
We tend to give to the machine the same inner world that inhabit ourselves
It is a well-known perception bias. In particular in image processing techniques.
The problem is a lack of a clear measure of the quality of the text. With such a measure, it was shown that all improvements are incrementals.
A formidable language tool which can correct / summarize / develop / translate / etc…
BUT NOT A KNOWLEDGE BASE !
Law of large numbers
With a large number of draw, every stochastic law becomes predictive
Law of truly large numbers:
even VERY unlikely events will occur
All points are at the same distance !
to the center - to each other
All random matrices are inversible !
and are there own inverse
M-A Delsuc – Introduction to ML principles – 13/05/24 – CC BY-SA