True HTTPS for everyone, at last
12 Dec 2015

HTTPS is now both free and extremely easy to setup for website owners, thanks to the efforts of the Let's Encrypt initiative. SSL certificates used to cost $100+/year and were a pain to install on a webserver. From the moment I learned that Let's Encrypt's public beta was open, it took me less than 10 minutes to replace my crappy self-signed SSL certificate by a certificate recognized by a legitimate SSL authority. Yes, it's that easy.

I don't like to post step-by-step tutorials on this website, but I believe the sheer simplicity of the install is a real game changer. See for yourself.

(the following instructions are for a website served by Nginx on Ubuntu.)

1) Generate a certificate

Fetch let's encrypt:

# Instructions at https://github.com/letsencrypt/letsencrypt/
git clone https://github.com/letsencrypt/letsencrypt && cd letsencrypt

Write a tiny renew.sh bash script for generating the mydomain.com certificate:

#! /bin/bash
service nginx stop
/path/to/letsencrypt/letsencrypt-auto certonly --standalone -d mydomain.com --renew-by-default
service nginx start

Generate your first ever legit SSL certificate (yay!):

./renew.sh

Automate the certificate renewal:

$ sudo crontab -e
m  h  dom mon dow   command
0  0   *   *  1     /path/to/letsencrypt/renew.sh

2) Configure Nginx

Configure an Nginx site in /etc/nginx/sites-enabled/mydomain:

server {
    # Redirect http calls to https
    listen 80;
    server_name mydomain.com;
    return 301 https://$server_name$request_uri;
}

server {
  listen 443 ssl;
  server_name mydomain.com;

  ssl_certificate /etc/letsencrypt/live/mydomain.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/mydomain.com/privkey.pem;
  
  location / {
    # The rest of your configuration goes here
    ...
  }
}

3) Bonus settings

The Nginx settings require some tuning to obtain an A grade on the SSL test:

http {
    ...

    ssl_prefer_server_ciphers on;

    # Enable perfect forward secrecy and disable RC4 (https://community.qualys.com/blogs/securitylabs/2013/03/19/rc4-in-tls-is-broken-now-what)
    ssl_ciphers "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 EECDH+ECDSA+SHA256 EECDH+aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+aRSA+RC4 EECDH EDH+aRSA RC4 !aNULL !eNULL !LOW !3DES !MD5 !EXP !PSK !SRP !DSS !RC4";

    # Disable SSLv3 because of POODLE vulnerability: http://disablessl3.com/
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    # Use custom Diffie-Hellman group for a strong key exchange: https://weakdh.org/sysadmin.html
    # Key was generated via "openssl dhparam -out /etc/nginx/dhparams.pem 2048"
    ssl_dhparam /etc/nginx/dhparams.pem;
}

For instance, the SSL report for my mail webclient domain can be found here.


Clean Code
21 Feb 2015

What is good code? This is a question that we ask every time we try to fix buggy software, refactor dirty code or comment on someone else's code, e.g: during a code review. The problem is that there is no perfect answer to that question. There is no widely-accepted metric for code quality. After all, software development is not an exact science. So everyone has a definition for "clean" code that is pretty personal, and very often quite controversial.

I don't think I'm too dogmatic when it comes to code writing. Now, I must say I'm a big fan of test-driven development:

  1. Write a tiny test that breaks
  2. Write a tiny bit of code that fixes the test
  3. Refactor
  4. Go back to step 1

However, I seldom follow this methodology to the letter. The tests I write are a little bigger than they should be; the code that fixes the test often does a little more than that; I usually don't refactor before the entire feature is implemented.

When I'm working on my own (not in a team) with a language or a framework that I know very well, I often don't write any test at all and I follow README-driven development:

  1. Write the README
  2. Write the code

And sometimes when I need to quickly get some hacky code out the door, I just put my Programming Motherfucker t-shirt on:

  1. Write some fucking code.

But as far as I am concerned, you don't need to follow any of these methodologies to write actual clean, good code. Again, dogma is not my cup of tea. However, during code reviews there are two very specific subjects on which I will not at all be open to discussion. Basically, I'm a fucking fascist about enforcing them:

  1. No code duplication
  2. Short functions

I'm pretty much open to discussion when it comes to writing (or not writing) unit tests, comments, picking variable names, or enforcing code style. But I will simply not negotiate with copy-pasting- and monstruous-function-writing developers. During a code review, if both parties disagree on the best way to write a piece of code, it often turns out it's for ideological reasons. And I'm all for taking the other's point of view, and even embracing a new code-writing ideology except when it comes to these two very specific points. For instance, I will have no empathy whatsoever with developers who claim that "this piece of code might be duplicated, but it's more explicit this way". Neither will I ever understand why "longer functions are clearer because you don't have to navigate between functions to understand what they actually do."1

Of course, I'm not the only moron who thinks code duplication and long functions hurt code quality, and there are many arguments against them. They were all summarised pretty well by Robert C. Martin and I don't think it's necessary to duplicate them here (pun intended). It's just that if you would like to challenge these assumptions and spend my time arguing against them, then I'm probably not the right person to review your code.

[1] That one is an actual quote.


The day deep learning took computer vision by storm
1 Sep 2014

The first time I heard about "deep learning" was a couple of days before the 2012 European Conference on Computer Vision (ECCV 2012) took place in Firenze, Italy. At the time I was working as a computer vision scientist with interest in image detection and classification. In particular, I was especially interested in the PASCAL and ImageNet image classification challenges that take place every year.

Shortly before I boarded my plane to Italy, word had come out that one of the contestants in the ImageNet challenge had results that vastly outperformed all others. This was unusual for two reasons. First, it's very rare that the winner of one of the challenges outperforms the second, or even the third peformer, by more than a couple percentage points. There isn't much secrecy between the teams, so they all (kinda) pick their ideas from a common pool of efficient, published methods. Winning teams are usually the ones that have applied one or more interesting tweaks to an existing method. And that's the second reason why the result was surprising: that year, the winning team of ImageNet had used neural networks for image classification. Now, in computer vision, the last time someone had produced interesting results involving neural networks was in the 1990s -- prehistory, in other words.

The ECCV workshop during which contestants presented their methods and winners received their awards was taking place on the very last day of the conference, and I remember it quite vividly. First, the results for the PASCAL challenge were presented: nothing too surprising there. Progressively the small, half-empty room starts to fill up. After a couple hours, when it's finally time to present the ImageNet results, the room is packed: all seats are occupied and there are people sitting on the floor or standing as far as the doorway.

Ass kicking time

Finally, it was time for the "neural network thing" presentation. Alex Krizhevsky gets on stage. So the whole room (and corridor) starts listening to this pale, thin, glass-wearing, extremely nerdy PhD student who speaks in very precise and factual terms. And this guy proceeds to blowing our minds and kicking our collective asses. This is what it looked like.

We need to clarify something: the ImageNet challenge is probably the most difficult computer vision task for which there exists a publicly available dataset: it's a massive dataset of 1.2 million training images divided in a thousand classes. If you want to participate, your classifier will have to be able to distinguish Australian Terrier from Yorkshire Terrier -- among other things.

Believe me, it's pretty fucking hard.

So this Alex guy is on stage and delivers his presentation. Quantitatively speaking, the results outperform the second best method by 40 percentage points.

Supervision performance

That's big. Huge, actually. Scientists witness such an improvement an average of once per lifetime. This sort of performance increase is not supposed to occur in mature research fields. And the visual results are stunning, too (look at the last slides).

Krizhevsky's and Hinton's approach is fully described in their now-famous NIPS 2012 paper.The science behind the proposed solution seems pretty straightforward, but the genius lies in the implementation of the deep neural network: in the presentation, Alex describes at great length his GPU implementation which is the actual breakthough behind the great results he obtained -- this and the sheer amount of labelled, training data made available in ImageNet. In other words, the labelled training data provided in previous public computer vision datasets were not large enough for deep neural networks that have a very large number of parameters. By the way, in his presentation Alex made it abundantly clear that, despite the size of ImageNet, his deep neural network was overfitting the data. I find this exhilarating.

"Where did I put this old neural networks book again?"

So Alex ends his presentation, the audience is stunned and there is some clapping. Many are thinking: "Have we worked all this time for nothing? Should we throw away most computer vision papers from the past decade?" It's time for the traditional question session. Pretty rapidly, the questions acquire an aggressive undertone:

How would your approach work on a more realistic, difficult dataset?

What do your results say about the quality of the ImageNet dataset?

The organisers of the ImageNet challenge are suddenly under a fire of criticisms from some very well respected figures of the computer vision community. The reasoning was that if such a simple method works when so many brilliant people have tried such complex ideas, then the dataset on which the experiment was conducted was flawed. As the manager of the ImageNet challenge, Li Fei Fei is taking most of the heat. Yann LeCun, a long-time proponent of neural networks is there too and most questions are now directed to him. I remember Li Fei Fei her asking him:

How is the method you just described conceptually different from your 1988 paper?

To which Yann answers:

It's exactly the same method!

I got really depressed by that.

Aftermath

Of course, we now know that the success of deep neural networks on the ImageNet was not an isolated, singular event. Two years later, deep convolutional neural networks (CNN) have outperformed more traditional methods in most computer vision problems. It's revolution. Yann LeCun and Geoffrey Hinton have been acqui-hired by Facebook and Google, respectively. And the ImageNet challenge is doing just fine, thank you.

I consider it a privilege to have attended this particular presentation; I could have learned about the science behind CNN just by reading papers, but what I had not foreseen was the incredible resistance that the computer vision community opposed to CNN, in the beginnings. Scientists are the makers of change, yet they could not accept the wave of change that was crashing on them. Hey, they are humans too, right? right?


Prodigal
9 Sep 2013

tl;dr: I have just released Prodigal, a static website generator written in Python to generate multilingual websites.

A while ago, I found myself rewriting the Nuli! 努力! website from scratch. This website does not need any backend: it's just a landing page that redirects you to the Chinese learning-app. So naturally I found myself looking for a static website generator. My favorite language for the web is Python, so I already knew about Hyde.

Tap for damageThe problem with Hyde is that it's pretty limited with regard to multilingual content. I needed to generate both English and French versions of the website with identical layout and I really wasn't about to duplicate a lot of HTML code. I had also gotten used to website localization with Django. So what I did was a quick-and-dirty Django static website; I simply made render_to_string calls with appropriate HttpRequest objects.

However, Django is not really meant to be run without a database, so this temporary solution was rather hackish. In fact, all I needed was the localization backend of Django, which is itself based on Babel and a template-rendering language; I picked Jinja2 for its similarity to the Django template-rendering language. And so was born Prodigal! I also added a SimpleHttpServer-based server for good measure.

I'm rather pleased with the result and I intend to keep maintaining this project -- in particular, I'd like to migrate to Python 3, once Babel and Jinja2 have made the transition. So if you have comments and questions, don't hesitate to open a pull request or an issue!


Innovation 101
21 Oct 2013

If you can't rely on your eureka moment — and you shouldn't — then what's left to do to produce some actually innovative work? First, you need to admit something.

You suck

Your genius idea is most probably not new. You are working on a subject that has interested hundreds, thousands, perhaps billions in the history of science and humankind. You are not more clever than all of them. In fact, you probably sit somewhere in the last 10% in terms of intelligence and experience. The probability that you have all of a sudden thought of something that not a single one of them has already explored is infinitesimal. If you decide to pursue your initial idea anyway, you will eventually realize that it either sucks, or it has been done before. Most probably both.

However, what is possible, and in fact quite likely, is that you may have combined multiple ideas and concepts that did not originate from you and the result of this combination might be innovative. If you're lucky, then this combination might even prove to be powerful, in which case you will have made a breakthrough.

What I'm saying here is nothing new, of course.

Standing on the shoulders of giants

The wisest of the philosophers asked: We admit that our predecessors were wiser than we. At the same time we criticize their comments, often rejecting them and claiming that the truth rests with us. How is this possible?" The wise philosopher responded: "Who sees further a dwarf or a giant? Surely a giant for his eyes are situated at a higher level than those of the dwarf. But if the dwarf is placed on the shoulders of the giant who sees further? ... So too we are dwarfs astride the shoulders of giants. We master their wisdom and move beyond it. Due to their wisdom we grow wise and are able to say all that we say, but not because we are greater than they.

Isaiah di Trani (c. 1180 – c. 1250), source: Wikipedia, emphasis mine

No innovation ever happens in an isolated, confined environment. Or perhaps it does, but only in the minds of the most incredible geniuses. And no, I'm not referring to you.

We are all dwarves: our experience of life, the world and science is limited. However, what we can do is study the works of our giant predecessors. If we do, we will find their limitations. We will understand what makes them great, and what can make them even better. Their limitations are your opportunity for innovation. And you will address these limitations by looking for novel ideas, things that the original inventor did not, and could not know about. Perhaps the combination of ideas will not completely work out: again, this will be an opportunity for you.

You might fear that your peers will tell you that you are doing "incremental innovation". This will happen, no doubt about it. But it will happen very rarely if your work is sufficiently powerful. Nobody worth listening to will criticize the lack of innovation of a method that dominates benchmarks and opens the door to a whole new array of applications.

Conclusion? Here we go:

  1. Do your homework and see what others have tried before you.
  2. Pick a problem your own size.
  3. Have great results.

As a #4, I could add "Get ready to defend yourself against aggressive criticisms", but that would take a whole new post.