Winner: None
25 Dec 2016

Everybody remembers where they were when they learned aout 9/11. Me, I was walking in a corridor in my school and saw a cluster of people sitting in silence in front of a TV that showed flames and smoke over the New York skyline, which I recognized instantly because I had been to New York with my parents as a kid and we had visited the Twin Towers together. My first thought was: "there's going to be a war".

On November 9 I was at my friend's house in João Pessoa, a couple days after I started my Brazil climbing trip. Right after I woke up I switched on my phone to check the results on the Le Monde website. The main title was in French but the words were not making any sense to me:

"Les explications de la large victoire de Trump".
("Explanations of the overwhelming Trump victory")

My first reaction was not surprise, but disbelief. I stared at the words on the screen, still half asleep, until I convinced myself that what they represented was the truth. But everything I had learned about this election during the past couple months contradicted what I was reading. Newspapers from three different countries, the 250+ people I was following on Twitter, all the blog posts, they had all predicted a Clinton victory. And they were all wrong. Not just a little bit wrong, like in: "we missed some of the nuances" but completely, utterly wrong as in "there is no FAIL gif powerful enough to say how fucking wrong we were".

I was feeling like a child who just learned Santa Claus doesn't exist while all his friends had conspired to make fun of him for so long. In a word, I was feeling pretty fucking stupid. Stupid, betrayed and pissed.

From one day to the next, I stopped checking my Twitter feed or reading newspapers. If all those fucking pundits have been so damn wrong about something so basic as predicting the outcome of an election (simple binary classification, in machine learning terms), then I am probably better off without their opinions. I am pissed at the Internet, which I trusted so much, and cannot trust anymore. The Internet I used to know is in a sorry state.

Internet has become mainstream, turning everyone into an attention whore selling clickbait links for likes and follows. We have become experts at personal branding, professional contestants in a big fucking reality show. Wannabe startups battle for our available brain time. Self-proclaimed experts package bullshit in 140-character haikus. We are all contestant and jury member of a gigantic, absurd beauty pageant, competing with our best friends and everyone we love most for the biggest number of likes on our Facebook timeline.

Internet wasn't always like that. It used to be a place for nerds, before the word "geek" became a meaningless complimentary adjective synonymous of "I don't undersand shit about how the Internet works but I know how to pronouce GIF". Sure, nerds were the ones who invented status badges, kitten videos and The Facebook. But at least they despised Mainstream. Mainstream means over-simplification. Mainstream means the least common denominator, it means idiocracy and the end of creativity. Mainstream means having to choose between a Trump and a Clinton.

Large swaths of the Internet have become the equivalent of mainstream TV. They have turned into tiny bubbles, echo chambers where minority opinions that stray too far from the local status quo get close to no exposure. Politics strive in this environment: they have always been experts at toying with mainstream media. They used to be clueless about social media, but oh boy look how far they've come.

2016 has been bad enough for America, but look how 2017 is going to be for France. Voters in the presidential election are going to have a choice between ideologies located at unpleasing ends of a political spectrum composed of the establishment, corruptibility, cronyism, corporate influence, deceit and racial hate. It's going to be a freak show, with participants all fighting over the attention of a few disgusted visitors. It's going to be bad but what can we do about it? It's the only game in town. In that game the only way not to lose is to not play.

It's depressing. But hey, you know what? There is hope. Of course there is. Because I'm not so naive to believe it's that bad. After all, as a software engineer, I know that the Internet is more than a content delivery platform: it's a technology. My disappointment and feeling of betrayal do not come from the Internet, but from the narrow, lazy, brainless usage of the Internet I've cornered myself into during the last couple years. There is hope because the Internet is much, much bigger than that. And as a software engineer, it's even better: I have the necessary skills to contribute to the Internet and to improve it, make it more like I want it to be.

So yes, hope.


True HTTPS for everyone, at last
12 Dec 2015

HTTPS is now both free and extremely easy to setup for website owners, thanks to the efforts of the Let's Encrypt initiative. SSL certificates used to cost $100+/year and were a pain to install on a webserver. From the moment I learned that Let's Encrypt's public beta was open, it took me less than 10 minutes to replace my crappy self-signed SSL certificate by a certificate recognized by a legitimate SSL authority. Yes, it's that easy.

I don't like to post step-by-step tutorials on this website, but I believe the sheer simplicity of the install is a real game changer. See for yourself.

(the following instructions are for a website served by Nginx on Ubuntu.)

1) Generate a certificate

Fetch let's encrypt:

# Instructions at https://github.com/letsencrypt/letsencrypt/
git clone https://github.com/letsencrypt/letsencrypt && cd letsencrypt

Write a tiny renew.sh bash script for generating the mydomain.com certificate:

#! /bin/bash
service nginx stop
/path/to/letsencrypt/letsencrypt-auto certonly --standalone -d mydomain.com --renew-by-default
service nginx start

Generate your first ever legit SSL certificate (yay!):

./renew.sh

Automate the certificate renewal:

$ sudo crontab -e
m  h  dom mon dow   command
0  0   *   *  1     /path/to/letsencrypt/renew.sh

2) Configure Nginx

Configure an Nginx site in /etc/nginx/sites-enabled/mydomain:

server {
    # Redirect http calls to https
    listen 80;
    server_name mydomain.com;
    return 301 https://$server_name$request_uri;
}

server {
  listen 443 ssl;
  server_name mydomain.com;

  ssl_certificate /etc/letsencrypt/live/mydomain.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/mydomain.com/privkey.pem;
  
  location / {
    # The rest of your configuration goes here
    ...
  }
}

3) Bonus settings

The Nginx settings require some tuning to obtain an A grade on the SSL test:

http {
    ...

    ssl_prefer_server_ciphers on;

    # Enable perfect forward secrecy and disable RC4 (https://community.qualys.com/blogs/securitylabs/2013/03/19/rc4-in-tls-is-broken-now-what)
    ssl_ciphers "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 EECDH+ECDSA+SHA256 EECDH+aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+aRSA+RC4 EECDH EDH+aRSA RC4 !aNULL !eNULL !LOW !3DES !MD5 !EXP !PSK !SRP !DSS !RC4";

    # Disable SSLv3 because of POODLE vulnerability: http://disablessl3.com/
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    # Use custom Diffie-Hellman group for a strong key exchange: https://weakdh.org/sysadmin.html
    # Key was generated via "openssl dhparam -out /etc/nginx/dhparams.pem 2048"
    ssl_dhparam /etc/nginx/dhparams.pem;
}

For instance, the SSL report for my mail webclient domain can be found here.


Clean Code
21 Feb 2015

What is good code? This is a question that we ask every time we try to fix buggy software, refactor dirty code or comment on someone else's code, e.g: during a code review. The problem is that there is no perfect answer to that question. There is no widely-accepted metric for code quality. After all, software development is not an exact science. So everyone has a definition for "clean" code that is pretty personal, and very often quite controversial.

I don't think I'm too dogmatic when it comes to code writing. Now, I must say I'm a big fan of test-driven development:

  1. Write a tiny test that breaks
  2. Write a tiny bit of code that fixes the test
  3. Refactor
  4. Go back to step 1

However, I seldom follow this methodology to the letter. The tests I write are a little bigger than they should be; the code that fixes the test often does a little more than that; I usually don't refactor before the entire feature is implemented.

When I'm working on my own (not in a team) with a language or a framework that I know very well, I often don't write any test at all and I follow README-driven development:

  1. Write the README
  2. Write the code

And sometimes when I need to quickly get some hacky code out the door, I just put my Programming Motherfucker t-shirt on:

  1. Write some fucking code.

But as far as I am concerned, you don't need to follow any of these methodologies to write actual clean, good code. Again, dogma is not my cup of tea. However, during code reviews there are two very specific subjects on which I will not at all be open to discussion. Basically, I'm a fucking fascist about enforcing them:

  1. No code duplication
  2. Short functions

I'm pretty much open to discussion when it comes to writing (or not writing) unit tests, comments, picking variable names, or enforcing code style. But I will simply not negotiate with copy-pasting- and monstruous-function-writing developers. During a code review, if both parties disagree on the best way to write a piece of code, it often turns out it's for ideological reasons. And I'm all for taking the other's point of view, and even embracing a new code-writing ideology except when it comes to these two very specific points. For instance, I will have no empathy whatsoever with developers who claim that "this piece of code might be duplicated, but it's more explicit this way". Neither will I ever understand why "longer functions are clearer because you don't have to navigate between functions to understand what they actually do."1

Of course, I'm not the only moron who thinks code duplication and long functions hurt code quality, and there are many arguments against them. They were all summarised pretty well by Robert C. Martin and I don't think it's necessary to duplicate them here (pun intended). It's just that if you would like to challenge these assumptions and spend my time arguing against them, then I'm probably not the right person to review your code.

[1] That one is an actual quote.


The day deep learning took computer vision by storm
1 Sep 2014

The first time I heard about "deep learning" was a couple of days before the 2012 European Conference on Computer Vision (ECCV 2012) took place in Firenze, Italy. At the time I was working as a computer vision scientist with interest in image detection and classification. In particular, I was especially interested in the PASCAL and ImageNet image classification challenges that take place every year.

Shortly before I boarded my plane to Italy, word had come out that one of the contestants in the ImageNet challenge had results that vastly outperformed all others. This was unusual for two reasons. First, it's very rare that the winner of one of the challenges outperforms the second, or even the third peformer, by more than a couple percentage points. There isn't much secrecy between the teams, so they all (kinda) pick their ideas from a common pool of efficient, published methods. Winning teams are usually the ones that have applied one or more interesting tweaks to an existing method. And that's the second reason why the result was surprising: that year, the winning team of ImageNet had used neural networks for image classification. Now, in computer vision, the last time someone had produced interesting results involving neural networks was in the 1990s -- prehistory, in other words.

The ECCV workshop during which contestants presented their methods and winners received their awards was taking place on the very last day of the conference, and I remember it quite vividly. First, the results for the PASCAL challenge were presented: nothing too surprising there. Progressively the small, half-empty room starts to fill up. After a couple hours, when it's finally time to present the ImageNet results, the room is packed: all seats are occupied and there are people sitting on the floor or standing as far as the doorway.

Ass kicking time

Finally, it was time for the "neural network thing" presentation. Alex Krizhevsky gets on stage. So the whole room (and corridor) starts listening to this pale, thin, glass-wearing, extremely nerdy PhD student who speaks in very precise and factual terms. And this guy proceeds to blowing our minds and kicking our collective asses. This is what it looked like.

We need to clarify something: the ImageNet challenge is probably the most difficult computer vision task for which there exists a publicly available dataset: it's a massive dataset of 1.2 million training images divided in a thousand classes. If you want to participate, your classifier will have to be able to distinguish Australian Terrier from Yorkshire Terrier -- among other things.

Believe me, it's pretty fucking hard.

So this Alex guy is on stage and delivers his presentation. Quantitatively speaking, the results outperform the second best method by 40 percentage points.

Supervision performance

That's big. Huge, actually. Scientists witness such an improvement an average of once per lifetime. This sort of performance increase is not supposed to occur in mature research fields. And the visual results are stunning, too (look at the last slides).

Krizhevsky's and Hinton's approach is fully described in their now-famous NIPS 2012 paper.The science behind the proposed solution seems pretty straightforward, but the genius lies in the implementation of the deep neural network: in the presentation, Alex describes at great length his GPU implementation which is the actual breakthough behind the great results he obtained -- this and the sheer amount of labelled, training data made available in ImageNet. In other words, the labelled training data provided in previous public computer vision datasets were not large enough for deep neural networks that have a very large number of parameters. By the way, in his presentation Alex made it abundantly clear that, despite the size of ImageNet, his deep neural network was overfitting the data. I find this exhilarating.

"Where did I put this old neural networks book again?"

So Alex ends his presentation, the audience is stunned and there is some clapping. Many are thinking: "Have we worked all this time for nothing? Should we throw away most computer vision papers from the past decade?" It's time for the traditional question session. Pretty rapidly, the questions acquire an aggressive undertone:

How would your approach work on a more realistic, difficult dataset?

What do your results say about the quality of the ImageNet dataset?

The organisers of the ImageNet challenge are suddenly under a fire of criticisms from some very well respected figures of the computer vision community. The reasoning was that if such a simple method works when so many brilliant people have tried such complex ideas, then the dataset on which the experiment was conducted was flawed. As the manager of the ImageNet challenge, Li Fei Fei is taking most of the heat. Yann LeCun, a long-time proponent of neural networks is there too and most questions are now directed to him. I remember Li Fei Fei her asking him:

How is the method you just described conceptually different from your 1988 paper?

To which Yann answers:

It's exactly the same method!

I got really depressed by that.

Aftermath

Of course, we now know that the success of deep neural networks on the ImageNet was not an isolated, singular event. Two years later, deep convolutional neural networks (CNN) have outperformed more traditional methods in most computer vision problems. It's revolution. Yann LeCun and Geoffrey Hinton have been acqui-hired by Facebook and Google, respectively. And the ImageNet challenge is doing just fine, thank you.

I consider it a privilege to have attended this particular presentation; I could have learned about the science behind CNN just by reading papers, but what I had not foreseen was the incredible resistance that the computer vision community opposed to CNN, in the beginnings. Scientists are the makers of change, yet they could not accept the wave of change that was crashing on them. Hey, they are humans too, right? right?


Prodigal
9 Sep 2013

tl;dr: I have just released Prodigal, a static website generator written in Python to generate multilingual websites.

Static sites are the fixies of the internet

A while ago, I found myself rewriting the Nuli! 努力! website from scratch. This website does not need any backend: it's just a landing page that redirects you to the Chinese learning-app. So naturally I found myself looking for a static website generator. My favorite language for the web is Python, so I already knew about Hyde.

Tap for damageThe problem with Hyde is that it's pretty limited with regard to multilingual content. I needed to generate both English and French versions of the website with identical layout and I really wasn't about to duplicate a lot of HTML code. I had also gotten used to website localization with Django. So what I did was a quick-and-dirty Django static website; I simply made render_to_string calls with appropriate HttpRequest objects.

However, Django is not really meant to be run without a database, so this temporary solution was rather hackish. In fact, all I needed was the localization backend of Django, which is itself based on Babel and a template-rendering language; I picked Jinja2 for its similarity to the Django template-rendering language. And so was born Prodigal! I also added a SimpleHttpServer-based server for good measure.

I'm rather pleased with the result and I intend to keep maintaining this project -- in particular, I'd like to migrate to Python 3, once Babel and Jinja2 have made the transition. So if you have comments and questions, don't hesitate to open a pull request or an issue!