dotSecurity 2016

First session of the dotSecurity conference this year. I've been to a lot of dotConferences in the past years, and they always took place in the most gorgeous venues. This one was no exception. As usual, their is also no wifi in the theater, and attendees are invited to not use their laptops.

Obviously, I did not follow that advice, otherwise you couldn't have been able to read this short recap.

Making HTTPS accessible

The first talk of the day was by Joona Hoikkala, about https, and the hassle it is to properly install and configure it to a server. The overhead is getting so low and the benefits so high in term of security and privacy, that the whole web should be https. It protects end-users from session hijacking, content injection and arbitrary censorship.

Joona

And yes, I get the irony that, at the time of writing, this very own website is not served through https. I'll fix that soon.

The other advantages are that http/2 will only work with https. Well, according to the specs, it could also run without it. But in practice, no current implementation handle this case, and all need https enabled. It is also said that it should give better SEO results, but as with all things SEO-related, I wouldn't put much faith in that.

Still, the issue today (and that will be my excuse for not having https on this very own website) is that it is hard and cumbersome (and expensive!) to get a certificate for each server we own. We could default to a wildcard server (something like *.pixelastic.com), but this will just lower the overall security because if one server got compromised, all are insecure.

The solution today is to use Let's Encrypt that gives easy and free certificates for everybody. It comes with a command line interface and ease the burden of creating, revoking and updating certificates. There are more than 2 millions certificates issued through Let's Encrypt since its creation.

Overall, I must say that this first talk was a bit slow to start. I took me a while to understand that the speaker actually worked for Let's Encrypt. If anything, it gives me one more nudge to force me to move my servers to https.

Life of a vulnerability

The second talk was by Filippo Valsorda, from Cloudflare. He explained in very simple terms how vulnerabilities (or vulns) are discovered, published and fixed. Really nice overview for something that I only hear from a distance without knowing how the process really works.

Filippo

He started by defining a vuln by being basically a state of software that lets user do something that they are not allowed to do. Most common forms include DDOS, leaks of data, SQL injections or remote code execution.

Not all vulns have a nice name, website, logo and stickers. Most of them don't, actually. When you discover a vulnerability, you have to report it. There are two ways to report them: full disclosure and responsible disclosure.

For full disclosure, you post the vulnerability to a public forum, letting everybody know about it. Such forums are often target of legal threats of companies that don't want their vulnerabilities publicly accessible. Other people can choose to sell them. Who actually buy this vulns is still a bit obscure for me, but it can be people buying them to resell them later, or use them for their own profit (either governments or mafias).

For responsible disclosure (also known as coordinated disclosure), you do not post publicly, but contact the website/owner of the service being impacted. You tell them about the issue, the risks and how to fix it. This gives them the time to fix the issue before it does too much damage. That is why it is very important on any website or open-source software to have a way to contact the team for security reason. It can be a simple security@domain.com email address, but it shows that the team will take the matter seriously.

Also, being publicly acknowledged has having found a vuln on major websites like Google or Facebook is very nice for exposure and building a reputation of security expert.

Once the vuln is identified, it should be assigned a unique number so everybody knows they are talking about the same issue, and can work in a coordinated effort to fix it. There are hundreds of vulnerabilities issued per month, some will target only a very specific version of a very specific framework while other can have a much bigger impact.

Such identifies vulns are publicly announced with the list of affected version of the software, the explanation of the issue and the real world impact of such issue. Alongside are displayed patches and/or possible fixes as well as credits.

Then, he ended his presentation with something I found a bit weird to do in such a conference. He did a live exploit of a known vulnerability on an old version of Ruby on Rails to allow remote code execution. He used MetaSploit and Shodan to perfectly show how easy it is to find an insecure machine and run a known exploit to it.

Sure, the machine he targeted was his own, but it wasn't really clear at first. I don't think it was such a good idea to show the audience how easy it is to get access to distant machine and start breaking them, without telling them about the legal implication of doing (like it could get you in jail in some countries). I would have appreciated a bit more context here.

Still, very good talk that taught me a lot.

Multi-factor authentication

Following talk was by Jacob Kaplan-Moss. He told us everything about multi-factor authentication (which is a fancy word to say two-factor authentication, I didn't get the difference).

Jacob

Login to a website simply through a password is no longer enough. We need a secondary channel to confirm the person doing the action is actually the one we think it is. Computers with password saved in cache can be stolen, or left unlocked.

For 2FA, we need another object that is owned by the user trying to do a sensitive action. Most of the time it is mobile phone with a 2FA application that display a small code that you have to input on the website, but there are other ways.

You could send a SMS to the user, or give him an automated phone call with a special number he need to input. This is called out of band communication, you request information from the user in channel A that was send to him through channel B. This way you make sure you're not discussing with a fake user impersonating the real one because he stole his laptop.

There are two type of tokens you can exchange that way. Soft tokens are the own that are generated through 2FA Apps like Google Authenticator or Authy. Hard tokens are physical objects like YubiKeys.

All those solutions offer various degrees of risks, UX and cost. When choosing a solution, you should not only focus on the risks. If you take only the less risky solution you often end-up with the most miserable of UX and users will in the end not use your ultra-secure solution because it just too damn hard to use.

When assessing risk, you have to check if the token can be intercepted, if they can be brute-forced, if you can notice if a token has been stolen. You also have to check how the apps or hardware is protected against malware.

When assessing costs, you have to take into account both the cost for the company (how much does it cost to give YubiKeys to everybody?) as well as the cost for the end-user (will it be cumbersome, will they lose time doing it, etc.).

For example an out of band communication through SMS is easy. Most of your users already have a phone, and they know how to use SMS. It is also quite easy to intercept SMS, and such a communication channel could easily fail in countries where the phone network coverage is bad.

For soft tokens through authentication apps, it is much harder to compromise tokens (even if it possible), but you don't need a network as it can work offline. Still, there is a UX cost because you need to download an external app and manually input the code from your phone.

For hard tokens, the real threat is that the master key that can validate the tokens for all physical keys can be stolen. There is no risk-free solution, and even if such an event is unlikely, if this happens, you need to build new keys for everybody and ship them to wherever they are in the world. Cost-wise, it can become really expensive. But in term of UX, this is the must, you just have to press a button to prove that you are you.

Whatever the method you chose, you have to ask for 2FA not only when they login, but whenever they request a sensitive action. This could be adding new members on a team, changing passwords or billing information. You should also monitor for weird behavior, like changes in connecting devices or geo-location.

You also need a backup plan. What happens when people lose their phone or keys. How can they connect to the service again? There is no perfect answer for that question. You could provide user with backup recovery code, but then the user must store them securely somewhere which just moves the burden of security to the average users. Most won't store them at all.

You could allow for a backup phone, but this only increase the attack surface, making everything less secure. You could allow the support team to help, but they will then become open to social engineering. Or you can simply state that there is no way to recover if you lost your authentication method. Whatever method you chose, you have to tell you users upfront or they will be really pissed off when they discover they have no way to get their account back.

The speaker suggestion was the following setup, that differentiate between public accounts and internal accounts:

For public accounts, 2FA apps are the best trade-off in terms of UX and security. Authy being better than Google Authenticator UX-wise, but more costly for the company. You also have to request the token on each sensitive action. For recovery, don't allow support to recover an account, and provide backup codes.

For internal accounts, you can greatly improve both security and UX by putting a bit more money on the table and giving a YubiKey to each employee for 2FA. This will prevent outsiders from taking over an employee account. Add behavior analysis, like detecting when a connection is not done from the known workplaces, and trigger a 2FA check. Also, only allow 2FA reset when asked face to face to the security team.

I really liked a quote of the speaker during the short Q&A session that came afterwards:

Security should not be handled by a security team. It should be for the day to day developer. Just like tests should no longer be in the hand of testing teams but embedded into development through things like TDD.

Lightning talks

After those talks and a break, we continued with a small commercial break, aka. Lightning talks.

The first one presented in the form of a git branch workflow what he called the classical "security audit workflow". You request a security audit on your code, so the security team creates a new security branch to start working on it. During that time, the main develop branch keeps growing because devs don't stop coding during the audit. Then it's time for the release, so we merge develop into release. But the security audit wasn't finished, so no security fixed was pushed to production. And once the security audit is done, the code on develop has changed so much that the audit is useless.

His solution to that was to do runtime checking of the code, through his company called Sqreen.

From my point of view, the whole premises where fucked up. He was giving a solution to the wrong problem. Why would you even start a security audit if you're not going to implement the results before shipping? This is the root cause of the issue. Either integrate security testing in the flow process, fix issues before release or open a public bounty program. There are so many other ways to avoid the absurd previous branching model shown on screen.

I'm not saying Sqreen product is bad (we even used it at some point), but that what it was trying to solve was not a tech issue but an organizational issue.

Then Ori did some Ori on stage, showing dead squirrels and demonstrating a slack bot to publicly shame sudo users.

Finally a guy from OpenCredo tried to explain something, but as he was constantly looking at the slides behind him, I couldn't hear what he was saying.

I'm a bit disappointed by the lightning talks of this edition. This looks too much like the bad sessions of the ParisTechTalk meetup, where you have talks that are only an excuse to sell a product or company. This is especially surprising because all the other talks of all the other dotConferences takes great care to avoid talking about their companies.

Content Security Policy

We then came back to the real talks with Scott Helme and the CSP (Content Security Policy). I discovered the CSP last October at ParisWeb, so nothing really new here. I still think it's something I should use more and it got me thinking about so many usages.

Scott

But first, let's explain what that is. Basically it's a new header (content-security-policy), that you add to your responses. Its value is a long string.

It will tell the browser what it can or can't do regarding loading of external sources. And the browser support is actually quite good already. You can tell the browser to only load assets from your specific CDN as well as your main domain, to protect you from XSS. You can fine-tune it to apply different rules for images, videos, scripts, styles, etc. You can prevent your page from being embedded in an iframe, etc.

By default it will block the execution of all inline script in your page, forcing you to load them from an external file that should be hosted on one of the servers in the whitelist. Another cool feature is to automatically disable the loading of http assets on an https page, avoiding the warning about mixed content. You can even force the browser to always upgrade http urls to https.

But the two most interesting meta features are that you can toggle a report mode that will not block anything, but only display warning in the console. This lets you test it to see what will break before really deploying it. The other feature is a way to send those reports to an external server instead of displaying them in the console, for later analysis.

I wonder how easy it is to test those CSP with statically generated websites (the Middleman/Jekyll type). I'm also wondering how much we could exploit this mechanism to send detailed information about the user, directly from the browser, without any third-party tracking script.

I mean, if I turn on report mode only and disable loading of all assets, I will receive the report of all the assets the user wanted to load. Doing so I'll be able to analyze for example which users have an adblocker (because they didn't got "blocked" while trying to load an ad). Could I also load a script from a valid source that will dynamically inject a custom script with all the user data (viewport, ip, user-agent, etc.) in its url to log it?

I'm sure there are really interesting data to get from this feature, as well as increase the overall security of apps of course!

Diogo Monica

The next talk was my personal favorite of the whole afternoon. It was made by Diogo Monica, head of security at Docker. I did not get everything from the talk to be honest, but it was explained in such a clear and concise way that I really liked it.

Diogo

At its core, Diogo said that updates were one of the most, if not the most, important parts of security. When an issue is found and fixed, you have to update your software to actually enjoy this fix. And how do you update? Through your package manager.

So package manager are actually a very important part of the security chain, and any security issue in the package manager itself can compromise any package it handles.

All package managers today download updates through a secure connection. This offer great protection against man-in-the-middle (MITM) attacks, but is not enough. If the source is compromised, you are still downloading an infected update through a secure channel.

The best way to protect sources is to sign them, usually through PGP. As a maintainer, you add your PGP "stamp" on the package, and you give your public key to anybody wanting to check your packages. When a user downloads the package, he just have to check (with the public key), that the package was actually packaged by the maintainer. If an attacker changes anything in the source, the key check will fail.

But this won't protect you against what is named a "downgrade attack". The PGP signature only lets you check that the package you've download actually comes from the maintainer.

Now imagine you're using a specific piece of software in v1. You've downloaded the binary from the source, checked that it's really coming from the official source and install it. Two weeks later, a vuln is found in this version. A v2 is released that fixes the issue. You install it through your package manager, it downloads it from the source, checks that it really comes from the maintainer, and installs it. You're safe!

Or maybe not. It only checked that the file downloaded came from the maintainer. Not that it actually was the v2. Maybe you just reinstalled the exact same software. Maybe one of the servers that offered the download was compromised and offered v2 instead of v1 and you didn't see it because the PGP protection doesn't check the version, only that it comes from the correct maintainer.

An easy way to check would be to have a look at your current installed version. It's quite easy, but it is not the job of PGP, and it is apparently quite difficult for package manager to check that the correct version is installed.

Diogo then talked about the way to delegate power through a hierarchy of keys. Having one master key (stored offline), that could delegate power to other keys that would themselves delegate power. Keys on the top of the hierarchy would have longer expiration time while keys at the bottom will have short expiration time.

He also talked about a way to sign a package with multiple keys, to validate for example that a specific binary has correctly passed the staging and preprod tests. I must say I did not get exactly how all this part was working, but it is all included in the way Docker does manage its packages.

Really nice talk, I strongly encourage you to watch the video once it will be available.

Anne Canteaut

Next talk was not a very technical one, but more mathematical oriented. It was done by a researcher on cryptography. She pointed out that most of the security breaches described in the previous talks were never about weakness found in cryptographic parts.

Anne

Does that mean that the crypto part of security is perfect? Far from it. If you listen to crypto analysts, seems like a lot of algorithms are breached on a regular basis. But what does that mean anyway? And should we worry? Cryptographers are known to be paranoid, so should we really listen to them?

Spoiler: yes.

If cryptographers says that something is broken, you should stop using right now. Even if there is no practical way to exploit the weakness today, there will be in the future. Those breaches never get better, they always get worse.

There are two ways to consider a hash algorithm to be breached. The first one is to find two inputs that gives the same hash output. This is called a collision and md5 is breached in that way. This is "quite easy" to do, but does not really reduce the security level in any significant way. It opens the way to other exploits, but is not too dangerous in itself.

The second way is far more dangerous. It is when you are able to find a preimage, this means when given a specific hashed output, you can craft your own input that will result in this output. This is slightly different from the collision in one important way: the output is arbitrary and already given to you. It's like the birthday paradox: it is easier in any group of person to find two person with the same birthday, than in the same group finding one with the same birthday as you.

If you know how to craft a preimage, you will then be able to create an infected version of an valid input, that will give the same output as the valid one. If this happens, the algo is breached, and you should stop using it.

The number one rule of cryptography is to always trust public analysis. A lot of people can come up with new algorithm to hash data in a secure way. But it requires a large number of eyes looking at how it's done and brains trying to find breaches to actually prove that one is secure or not.

One of the most famous, AES, has been initially published in 1998 and is still used today, without any breach. The explanation of why it is so resilient is because he is the winner of a publicly held competition of algorithm that took place for 5 years. The best way to make your algo win in such a competition is to find breaches in the algo of the others. Doing so, you're sure to have the brightest minds on the subject trying to breach into it.

It was nice to have a glimpse of the mathematical side of this world. The subject is still very far from what I do, but interesting. I am completely unable to assess if a cryptographic algorithm is good or not, so I trust the experts on the subject. And it's nice to see that even expert can't easily assess every algo, and so that they have to trust each other on public analysis.

Paul Mockapetris

We then had a quick Q&A session with Paul Mockapetris, inventor of the DNS. From this exchange I'll only remember the following quote:

Paul

Hardware is like milk, you want it the freshest you can find. Software is like wine, you want it with a bit of age.

Web Platform Security

The last talk I could attend was by Mike West. There was another talk after this one, but I had a train to catch so couldn't stay up to the end. You can find the slides in here.

Mike

Mike started by telling us the story of Ulysses and the sirens. The sirens were beautiful female creatures with charming and dangerous voices. If men hear them, they would become crazy and jump overboard and drown. Ulysses learned from Circe that the only way to protect against the siren singing was to either put wax in your ears, or to tie yourself to the mast.

Same goes in web security. You have to tie yourself to the mast, meaning you have to follow a principle of least privilege. That way, even if an attacker manages to take control of your account, he won't be able to do much. It is better to do it that way than to try to block every possible way an attacker could compromise an account.

It is especially true for browsers. Browsers are becoming more and more powerful, accessing banking websites, connecting to bluetooth and camera. And being used by average users everyday. Mike, working in the Chrome dev team, told us about the beta features they are working on in Chrome.

He told us about the CSP, as we've seen, and its limitations. If you have different applications running on different subdomains, you cannot whitelist the main domain, because it will expose all subdomains to a breach in any of them. You could host a malicious script in one infected subdomain and make it load in another. They are working on something that would allow better granularity on the scheme, host and port allowed.

They are also adding another way to check for loaded script by adding a checksum verification on script. You would call your <script integrity="{hash}" src="..."> and the browser will check that the checksum of the loaded file matches the specified hash before executing it.

Conclusion

Really nice first occurrence of dotSecurity. Another one will take place next year and I will surely go again. I'll try to be an ambassador this time as well.

The panel of talks was large, from generic knowledge about the security world, to more technical one. I would say the mix was perfect.

Next year I'd like to see talks explaining in more details the selling and buying of vulnerabilities. Who discover them? How? Why? Who do they sell them to? How much? What do buyer do with it, etc.

HumanTalks February 2016

This month's HumanTalks was hosted by Criteo. We had a hard time managing to correctly plug the laptops in to the projector and connect to the wifi so we started a bit late. This forced us to cut down the questions from 10mn to only 6 or 7.

Intro

Customer Solutions Engineer

The first talk was by Nicolas Baissas, my coworker at Algolia. He explained what the job of CSE really means. The job originated mostly in the Silicon Valley for SaaS companies, but is moving to all sort of companies.

The goal of a CSE is to make the customers successful, by making them happy to use the product, for the longest amount of time. This is especially important for SaaS companies where revenue is based on a monthly payment. You do not want your customers to leave your service, and the best way to keep them is to offer them a better and better service.

CSE

There will always be customers leaving, which is what is called a churn, but the goal of a CSE is to make sure the company has a negative churn. This means that the customers that stay compensate for those who left, because they now use more of the service than before. The CSE must ensure that less and less customers want to leave, by understanding what they are looking for while managing to bring more and more value of the product to the already happy customers.

The only way to do that is to try to be part on their team, showing that you're on their side, and not trying to push selling more and more. CSE are experts on their product, and they share that expertise with the customers by email, by Skype or whenever possible, by meeting them directly. This also happens before, during and after Algolia's deployment in their system. The CSE ensures that they use the service at its best, and that it is used in the best way possible to fit their needs.

Month after month, they have a regular look at all the past implementations of their customers and reach out with advices if they see things that could be improved. Today Algolia has more than 1000 customers but only 4 CSEs, so this has trouble scaling. So in parallel they also work on making it easy for all the customers they do not have time to talk to.

They write guides, explaining specific features in detail, with examples. They have already explained the same thing hundreds of times on calls, so they have experience on how to explain it clearly, then it's just a matter of writing it down. They also write detailed tech tutorials. They all have a tech background and know how to code, so they can really understand what it requires to implement Algolia in an existing system.

The goal is to automate most of the recurring work. They built a new tab in the UI to analyze the current configuration of a customer and suggest improvements. Those are the exact same improvements than they would have suggested during a one-to-one call, but because they have the experience and know how to code, they can just simply make it self-service for users directly.

Some of the features are too complex to be correctly grasped just by reading the theory, like geosearch. So they created a demo with real data, using all the best practices and letting users play with it to see how it was working. This worked really well and transformed a theoretical feature into a real life application and in turn generated signups to the service.

What Nicolas really stressed out is that the role of a CSE is to be as close to the customer as possible, in order to really understand, in a real-life scenario, what he wants to do, with the very specifics of its project. But as well be as close as possible to the product itself, part of the team that builds it so he can exactly know which features are ready and how they work. By doing both you can really bring your deep expertise of the service to the specific issues of the customer while helping build the service with real-life examples of real-life issues customers have.

A CSE's ultimate goal is to not be needed anymore, the documentation and self-service information should be enough for most of the users, and core developers of the service should be in direct contact with the users so they know how people really use their service.

JSweet

The second talk was about JSweet, a transpiler of Java to JavaScript. A transpiler is like a compiler, it transforms a language into another language, or even the same. There are Java-to-Java transpilers, that can refactor the code and there are already a lot of tools transpiling to JavaScript (e.g. CoffeeScript, Dart and TypeScript).

JSweet

Of these three, TypeScript seems to be the most popular today. It was originally created by Microsoft, but then Google started using it for Angular. TypeScript mostly adds a typed layer on top of JavaScript, but still lets you use vanilla JavaScript wherever you want.

There already were attempts at a Java-to-JavaScript transpiler in the past, namely with GWT, but it was not as promising as announced and carried many organic limitations. GWT is too much of a blackbox, and you couldn't use the generated JavaScript with regular JS APIs, so that it was quickly outdated and the promise of having all Java in JavaScript wasn't even fulfilled. Mostly, it was done for developers that did not want to learn JavaScript and just wanted their backend code to work in the frontend as well.

Later on, we saw the emergence of NodeJS, where one of the cool features was that you could use the same language on the backend and frontend. NodeJS being written in JavaScript, it meant that you had to ditch your old Java/Ruby/PHP backend and put NodeJS instead. JSweet follows the same logic of "same language" everywhere, but this time it lets you use JavaScript with the Java language.

TypeScript syntax being really close to Java, it is easy to transpile from one to the other. Then, TypeScript transpiling to JavaScript, you can transpile all the way from Java to JavaScript.

This lets you use all your JavaScript libraries (like underscore, moment, etc) directly in your Java code. And you can also write your frontend code with Java, letting you follow the paradigm of "one language everywhere". Internally your Java code will be transpiled to TypeScript, then to JavaScript. Not all of Java will be available in JSweet, though.

I never coded in Java so I am unsure how useful this really is, but it seemed like a nice project if you want to keep the same language from back to front.

Shadow IT

Next talk was about what is called the Shadow IT, by Lucas Girardin. The shadow IT encompass all the IT devices in a company that are invisible to the radar of the official IT departement. It includes all the employees cell phones that are used to check personal emails during the day, all the quantified self devices (FitBit, etc), Excel files filled with custom macros, personal dropbox account and even the external contracts with freelancers that are not approved by the IT department.

ShadowIT

Granted, this kind of issues only occur in big companies, where there are way too many layers between the real needs of employees and the top hierarchy trying to "rationalize" things. This talk gave a great echo to the first talk about CSE and reminded me why I quit consulting ;).

Anyway, the talk was still interesting. It started by explaining why these kinds of shadow developments appeared in the first place: mainly because the tools the employees have were not powerful enough to let them do their job in the best environment. And because employees are getting more and more tech-savvy, they found their own ways to bypass the restrictions. They expect to have the same level of features in their day job than at home or on their smartphone. If their company cannot provide it, they will find other ways.

Unfortunately, these ways are frowned upon by the IT departement. Maybe the Excel sheet the employee is creating is really useful, but maybe it is also illegal in regard to personal information storage. Or it will break as soon as the Excel version is changed, or be completely lost when the employee leave and no backup exists.

Then I started getting lost in the talk. Some of the concepts he talked about were too alien to what I experience everyday that I had trouble understanding what it was really about. In the end he suggested a way to still rationalize those various independant parts, by building a Platform that lets users build their own tools, even if they do not know how to code. This platform would get its data from a Referential that is accessible company-wide and hold the only real trustable source of data. And finally, the IT departement will build Middlewares that will help application A to communicate with application B.

In the end, the IT department will stop building custom applications for their employees, but simply provide the tools to help them build it themselves. Still, it will have to create the middlewares to let all those parts discuss.

I cannot help but think that this does not fix the initial issue but simply gives the IT department the feeling that it is in control again. As soon as the platform tools will be too limited for employees to really do what they want (this will happen really quickly), they will revert to using other, more powerful, tools and this will still be out of the IT departement reach. I fail to see how this is any different from what it was before, except that instead of building the application themselves, the IT departement now builds the tools so the employees can build the applications, but they are still needed to make them work together and will still be a bottleneck.

You can find the slides here

Criteo Hero Team

The last talk was presented by Michel Nguyen and was about the Criteo Hero Team.

Michel told us about his team, the Hero team, for Escalation, Release and Ops. He added a H in front of it to make it cooler. The team is in charge of all prod releases as well as dealing with incidents.

Hero team

They realized that whenever they put something in production, something else breaks. So they started coordinating the releases and having all the people that could understand the infra in the same team, to better understand where this could break.

The now use a 24/7 "follow the sun" schedule where teams in France and the US are always awake to follow the potential issues. They have an escalation system with two layers, that lets them deal with the minor issues without creating a bottleneck for the major ones. The Hero team is in charge of finding the root causes of the issues and if none can be found quickly enough, they will just find a temporary workaround and dig deeper later. Once the issue is found and fixed, they do a postmortem to share with everybody what went wrong, how they fixed it, and ways to prevent it from happening again.

They use Centreon and Nagios as part of their monitoring and check after each production release the state of the metrics they are following, to see if nothing extremely abnormal appeared. If too many metrics changed too widely, then we can assume something is not working correctly.

The current production environment of Criteo is about 15.000 servers, which weigh as much as 6 Airbuses and would be twice the height of the Empire State Building. They handle about ~1200 incidents per year and resolve about 90% of them in escalation. The last 10% are incidents that depends from third-parties, or one-shot incidents they never understood.

To be honest, even if the talk was interesting (and Michel is a very good speaker), it felt too much like a vanity metrics contest. I know Michel had to cut his talk from 30mn to 10mn in order to fit in the HumanTalks format, so I'd like to see the full version of it.

Conclusion

I did not felt like I was the target of the talks this time. I already knew everything about the CSE job because I work with them everyday, I never coded in Java, stopped working in companies big enough to have a Shadow IT issue and as I said, the last one left me hungry for more.

Still, nice talks and nice chats afterwards. Except for the small hiccup with the projector and wifi at the start, the room was perfect and very comfortable and the pizza + sushi buffet was great.

Next month we'll be at Viadeo. Hope to see you there!

HumanTalks January 2016

Note: I'm actually writing this blog post several months after the event, so my memory might not be the freshest here.

Anyway, we had 4 talks as usual for the HumanTalks, hosted at Deezer this time, with food courtesy of PaloIT.

Apache Zepellin

First talk was by Saad Ansari, with an introduction to Apache Zeppelin. Their use-case was that they had a huge amount of tech data (mostly logs) and they had no idea what to do with it.

They knew they should analyze it and extract relevant information from it, but they had so much data, in various forms, that they didn't really know where to start. Sorting them manually, even just to find which data was interesting and which was garbage was a very long task that weren't able to do.

So they simply pushed them to Zeppelin. It understand the global structure of the data and display it in tables and/or graphs. It basically expect CSV data as input and then lets you use a SQL-like syntax to do requests on it and display visual graphs. The UI even provides a drag'n'drop feature for easier refinement.

I was a bit confused as to who the target of such a tool was. Definitely it was not for any BigData expert, because the tool seem too basic. It wouldn't fit for someone not technical either because it still requires to write SQL queries. It's for the developer in between, to get an overall feeling of the data, without being too fine-grained. Nice to get a first overview of what to do with the data.

As the speaker put it, it's the Notepad++ of BigData. Just throw garbage CSV and logs in it, and then play with SQL and drag'n'drop to extract some meaning from it.

The Infinite, or why it is smaller that you may think

Next talk by Freddy Fadel was a lot more complex to follow. I actually stopped taking notes to focus on what the speaker was saying and trying to grasp the concepts.

It was about the mathematical definition of the infinity and what it implies, and how we can actually count it. Really I cannot explain what it was about, but it was still interesting.

At first I must say I really wondered what such a talk was doing in a meetup like the HumanTalks, and was expecting a big WTF moment. I was actually gladly surprised to enjoy the talk.

Why is it so hard to write APIs?

Then Alex Estela explained why it is so complex to build API, or rather, what are the main points that people are failing at?

First of all, REST is an exchange interface. It's main and sole purpose is to ease the exchange between two parties. The quality of the API will be as good as the communication there is between the various teams that are building it.

REST is just an interface, there is no standard and no specific language or infrastructure pattern to apply. This can be intimidating, and gives so many possible reasons to fail at building it.

Often people build REST API like they built everything else, thinking of SOAP, and exposing actions, not resources. Often, they build an API that only expose the internals of the system, without any wrapping logic. You also often see APIs that are too tailored for the specific needs of one application, or on the other hand that can let you do anything but built with no specific use-case in mind so you have to retro-engineer it yourself to get things done.

The technical aspect of building an API is not really an obstacle. It's just basic JSON over HTTP. Even HTTP/2 does not radically change things, it will just need a few adjustments here and there, but nothing too hard. The issue is the lack of standards, that give too many opportunities to do things badly. You can use specs like Swagger, RAML or Blueprint, they all are good choices with strength and weaknesses. Pick one, you cannot go wrong.

There is no right way to build an API in terms of methodology. The one and only rule you have to follow is to keep it close to the users. Once again, an API is a mean of communication between two parties. You should build it with at least one customer using it. Create prototypes, iterate on it, use real-world data and use-cases, deploy on real infrastructure. And take extra care of the Developer Experience. Write clear documentation, give examples on how to use it, give showcases of what you can build with it. Use it. Eat your own dog food. Exposing resources is not enough, you also have to consume them.

Make sure all teams that are building the API can easily talk to each other in the real world and collaborate. Collaboration is key here. All sides (producer and consumer) should give frequent feedback, as it comes.

To conclude, building an API is not really different than building any app. You have to learn a new vocabulary, rethink a bit the way you organize your actions and data, and learn to use HTTP, but it's not really hard.

What you absolutely need are users. Real-world users that will consume your API and use it to build things. Create prototypes, stay close to the users, get feedback early and make sure every actor of the project can communication with the others.

Why do tech people hate sales people ?

Last talk of the day was from Benjamin Digne, coworker of mine at Algolia. He explained in a funny presentation (with highly polished slides⸮) why dev usually hate sales people.

Being a sales person himself, the talk was much more interesting. Ben has always worked in selling stuff, from cheeseburgers to human beings (he used to be an hyperactive recruiter in a previous life).

But he realized that dealing with developers is very different from what he did before. This mostly come from the fact that the two worlds are actually speaking different languages. If you overly stereotype each part you'll see the extrovert salesman only driven by money and the introvert tech guy that spend his whole day in front of his computer.

Because those two worlds are so different, they do not understand each other. And when you do not understand something, you're afraid of it. This really does not help in building trust between the two parts.

But things are not so bleak, there are ways to create bridges between the two communities. First of all, one has to understand that historically sales people were the super rich superstars of the big companies. Techies were the nobodies locked up in a basement somewhere.

Things have changed, and the Silicon Valley culture is making superheroes out of developers. Still, mentalities did not switch overnight and we are still in an hybrid period where both sides have to understand what is going on.

Still, the two worlds haven't completely merged. Try to picture for a minute where the sales people office are located at your current company. And where the R&D is. Are they far apart, or are they working together?

At Algolia, we try to build those bridges. We first start by hiring only people with a tech background, no matter their position (sales, marketing, etc.), which makes speaking a common language easier. We also do what we call "Algolia Academies" where the tech team explain how some parts of the software are working to non-tech employees. On the other hand, we have "Sales classes" where the sales teams explain how they built their arguments and how a typical sales situation is. This helps each part better understand the job of the other part.

We also have a no-fixed-seats policy. We have one big open space, where every employees (including founders) are located. We have more desks than employees and everyone is given the opportunity to change desk at any time. Today we have a JavaScript develop sitting between our accountant and one of our recruiters, and a sales guy next to an op, and another one next to two PHP developers. Mixing teams like this really helps avoiding creating invisible walls.

Conclusion

The talks this time were kind of meta (building an API, sales/tech people, the infinity) and not really tech focused, but that's also what makes the HumanTalks so great. We do not only talk about code, but about everything that happens around the life of a developer. Thanks again to Deezer for hosting us and to all the speakers.

Paris Vim Meetup #11

I went to the 11th Paris Vim Meetup last week. As usual, it took place at Silex Labs (near Place de Clichy), and we were a small group of about 10 people.

Thomas started by giving a talk about vimscript, then I talked about Syntastic. And finally, we all discussed vim and plugins, and common issues and how to fix them.

Pictures

Do you really need a plugin?

This talk by Thomas was a way to explain what's involved in the writing of a vim plugin. It covers the vimscript aspect and all its quirks and was a really nice introduction for anyone wanting to jump in. Both Thomas and I learned vimscript from the awesome work of Steve Losh and his Learn Vimscript the Hard Way online book.

Before jumping into making your own plugin, there are a few questions you should ask yourself.

Isn't this feature already part of vim? Vim is such a huge piece of software that it is filled with features that you might not know. Do a quick Google search or skim through the :h to check if this isn't already included.

Isn't there already a plugin for that? Vim has a huge ecosystem of plugins (of varying quality). Check the GitHub mirror of vim-scripts.org for an easy to clone list.

And finally, ask yourself if your plugin is really useful. Don't forget that you can call any commandline tool from Vim, so maybe you do not have to code a whole plugin if an existing tool already does the job. I like to quote this vim koan on this subject:

A Markdown acolyte came to Master Wq to demonstrate his Vim plugin.

"See, master," he said, "I have nearly finished the Vim macros that translate Markdown into HTML. My functions interweave, my parser is a paragon of efficiency, and the results nearly flawless. I daresay I have mastered Vimscript, and my work will validate Vim as a modern editor for the enlightened developer! Have I done rightly?"

Master Wq read the acolyte-s code for several minutes without saying anything. Then he opened a Markdown document, and typed:

:%!markdown

HTML filled the buffer instantly. The acolyte began to cry.

Anybody can make a plugin

Once you know you need your plugin, it's time to start, and it's really easy. @bling, the author of vim-bufferline and vim-airline, two popular plugins, didn't known how to write vimscript before starting writing those two. Everybody has to start somewhere, so it is better to write a plugin that you would yourself use, this will give you more motivation into doing it.

A vim plugin can add almost any new feature to vim. It can be new motions or text objects, or even a wrapper on an existing commandline tool or even some syntax highlight.

The vimscript language is a bit special. I like to say that if you've ever had to write something in bash and did not like it, you will not like vimscript either. There are initiatives, like underscore.vim, to bring a bit more sanity to it, but it is still hackish anyway.

Vimscript basics

First thing first, the variables. You assign variables in vimscript with let a = 'foo';. If you ever want to change the value of a, you'll have to re-assign it, and using the let keyword again.

You add branching with if and endif and loops with for i in [] and endfor. Strings are concatenated using the . operator and list elements can be accessed through their index (it even understand ranges and negative indices). You can also use Dictionaries, that are a bit like hashes, where each key is named (but will be forced to a string, no matter what).

You can define functions with the function keyword, but vim will scream if you try to redefine a function that was already defined. To suppress the warning, just use function!, with a final exclamation mark. This is really useful when developping and sourcing the same file over and over again.

Variables in vimscript are scoped, and the scope is defined in the variable name. a:foo accesses the foo argument, while b:foo accesses the buffer variable foo. You also have w: for window and g: for global.

WTF Vimscript

And after all this basics, we start to enter the what the fuck territory.

If you try to concatenate strings with + instead of . (maybe because you're used to code in JavaScript), things will kind of work. + will actually force the variables to become integers. But in Vimscript, if a string starts with an integer, it will be casts as this integer. 123foo will become 123. While if it does not, it will simply be casts as 0. foo will become 0.

This can get tricky really quickly, for example if you want to react to the word under the cursor and do something only if it is an integer. You'll have a lot of false positives that you do not expect.

Another WTF‽ is that the equality operator == is actually dependent on the user ignorecase setting. If you :set ignorecase, then "foo" == "FOO" will be true, while it will stay false if the option is not set. Having default equality operators being dependent on the user configuration is... fucked up. Fortunatly, Vimscript also have the ==# operators that is always case-insensitive, so that's the one you should ALWAYS use.

Directory structure

Most Vim plugin packagers (Bundle, Vundle and Pathogen) expect you, as a plugin author, to put your files in specific directories based on what they do. Most of this structure is actually taken from the vim structure itself.

ftdetects will hold the code that is used to assign a specific filetype to files based on their name. ftplugin contains all the specific configuration to apply to a file based on its filetype (so those two usually works together).

For all the vim plugin writers out there, Thomas suggested using scriptease that provides a lot of debug tools.

Tips and tricks

Something you often see in plugin code is the execute "normal! XXXXX". execute lets you pass an argument that is the command to execute, as a string. This allows you to build the string yourself from variables. The normal! tells vim to execute the following set of keys just like when in normal mode. The ! at the end of normal is mandatory to override the user mappings. With everything wrapped in a execute you can even use special chars like <CR> to act as an Enter press.

Syntastic

After Thomas talk, I briefly talked about Syntastic, the syntax checker for vim.

I use syntasic a lot with various linters. Linters are commandline tools that analyze your code and output possible errors. The most basic ones only check for syntax correctness, but some can even warn you about unused variables, deprecated methods or even style violation (like camelCase vs snake_case naming).

I use linters a lot in my workflow, and every code I push goes through a linter on our Continuous Integration platform (TravisCI). Travis is awesome, but it is asynchronous, meaning I will receive an email a few minutes after my push if the build fails. And this kills my flow.

This is where syntastic comes in play. Syntastic lets you add instant linter feedback while you're in vim. The way I have it configured is to run the specified linters on the file I'm working whenever I save that file. If errors are found, they are displayed on screen, on the lines that contains the error, along with a small text message telling me what I did wrong.

It is then just a matter of fixing the issues until they all disappear. Because the feedback loop is so quick, I find it extremely useful when learning new languages. I recently started a project in python, a language I never used before.

The first thing I did was install pylint and configure syntastic for it. Everytime I saved my file, it was like having a personnal teacher telling me what I did wrong, warning me about deprecated methods and teaching me the best practices from the get go.

I really recommend adding a linter to your workflow as soon as possible. A linter is not something you add once you know the language, but something you use to learn the language.

Syntastic has support for more than a hundred language, so there's a great chance that yours is listed. Even if your language is not in the list, it is really easy to add a Syntastic wrapper to an existing linter. Without knowing much to Vimscript myself, I added 4 of them (Recess, Flog, Stylelint, Dockerfile_lint). All you need is a commandline linter that outputs the errors in a parsable format (json is preferred, but any text output could work).

Conclusion

After those two talks, we all gathered together to discuss vim in a more friendly way, exchanging tips and plugins. Thanks again to Silex Labs for hosting us, this meetup is a nice place to discover vim, whatever your experience with it.

HumanTalks December 2015

Note: These notes are from Adrien. Thank you very much Adrien!

For December, HumanTalks were at LeBonCoin. After several days of conferences with the dotJs, dotCss and apiDays Tim needed some typing REST and delegated it to me

Polyfill and Transpilers, one code for every browser

First talk by Alexandre Barbier (@alexbrbr, explain the why and how of progressive enhancement in js.

One of main task of web developers is to ensure compatibility across browsers. And if things are getting easier (less painfull) with the death (end of life support) of severals IE, web is still the most hostile environment.

Once target browser has been defined, there are two different ways to do it.

Using polyfills, which consist in reimplementing some API in pure js if this one is not defined. First you need to detect if the feature is available, if not you need to implement it.

If you want to use the last features of js, the one that has not been implemented (such as EcmaScript 6/2015), you need to use a transpiler (source to source compiler). More than 270 language target js, from coffeescript to clojurescript along with dart and typescript. One of the most used is Babel which just hit its 6th version

Transhumanism, the singularity advent

After Transpilers, Transhumanism by Pierre Cointe (@pierre_cointe.

The talk presented history of transhumanism, that came from eugenics as a way to evolve willingly the human specie. NBIC Technologies (Nano tech, Biotech, Information technology, Cognitive science)

Pierre presented some of the projects associated with it such as immortality, genome manipulation, consciousness transfer. Then he presented some of Raymond Kurzweil predict, which based on an extended Moore law to predict the singularity around 2030, the singularity being the point in time where a super computer would be more powerful than an human brain.

Develop application for the TV

The next talk was done by Mickaël GREGORI (@meekah3ll), and present us his experience developing application for the television.

Not that a friendly place neither, with no standards, SDK, xmls... After presenting the market he focused on three products: ChromeCast from Google, Roku, and Android TV. Most of the application consist in creating a new dynamic channel TC.

To conclude he talked a bit about a standard that may be on its way, W3C being working on a TV API

How to be more productive with three methods

The fourth and last talk was made by Thibault Vigouroux (@teaBough)

He presented 3 ways he is using everyday to be more effective at what he is doing.

The first one was the Pomodoro which consist in working 25 minutes on the task, being focused, then taking a 5 minutes break to rest and letting the brain work in diffuse mode. He told us about Pomodoro Challenge, an application flexible that rely in gamification to get you used to practice.

Then he present a way to help choose a task, the Eisenhower Matrix. For it you need to label your task according to two criteria: importance, emergency.

Basically you do know what's both important and urgent, you delegate the urgent non important, and decide when to do what is important but not urgent. (note how I deleted the section about non important non urgent stuff)

To finish he talked about how to be better at a task with deliberate practice, which he used to switch to colemak layout. 5 components are vital for this:

  1. Being focused on improving the result
  2. Immediate Feedback
  3. Easily repeatable exercises
  4. Not really pleasant
  5. Mentally Intense

Conclusion

Very diverse and interesting talks as usual. :) Good meet up to conclude the year!