Mid september I was in Prague for the WriteTheDocs conference. I went there with fourofmycolleagues to learn how to improve our documentation. We discovered much more than what we initially expected.
First impression
From the first talk I realised that I actually did not know much about the community that I had joined. I expected it to be composed of those developers that also write documentation and enjoy it. But when the first speaker introduced himself as an engineer, and that it apparently was something worth specifing, I knew that I was going to have some surprises.
I then discovered that there is such a job as "Technical Writer". After two days of conference I'm still not sure what it means, to be honest. From what I gather, they are people with skills in writing and they know how to convey information in a clear and concise way. They can translate complex concepts into simpler words so others can understand them with minimal effort. A technical background is not mandatory, but asking question is paramount. They have to deeply understand the subject to be able to synthetize it.
Documentation is code
Throughout the conference, I saw talks explaining how important documentation was and why it should not be added as an afterthought. People were exposing issues in the way documentation was done, and suggesting ways to fix those issues.
At their core, issues people had with documentation were the same issues we have with code (quality, bloat, complexity, etc). The suggested solutions were also the same we apply to code (user testing, automated testing, linters, short feedback loops, etc).
Language as code
Good writing is how you run words into someone else brain to spark ideas. It's not different from a code you execute. If you write bad code, your code will do bad things. This is exactly as true for documentation.
Documentation is as important as code, because it is like code. Language is brain code. Every word will journey through the reader mind. You must be careful to only send important information, as fast as possible, and avoid overflow.
Syntax is paramount, and ambiguity must be avoided as it slows the process down. Readers shouldn't have to read a whole sentence before getting the meaning of it. They should be able to process it as it comes. It's the same as loading a big file to RAM versus reading it line by line.
Docs or it didn't happen
Documentation is as important as code when it comes to features. If undocumented, any feature is outdated as soon as it's shipped. If you're in a Scrum environment, then it means documentation should be part of the DoD of any feature.
As a developer, I will always add tests for any new feature. This is how I can prove that the feature is working. Writing documentation is proving that the feature actually exists.
If you see a GitHub repository with an empty readme, you'll assume the project is unfinished. If you see a project without documentation, you'll assume it's not usable.
This is even more true if you're documenting an API. User testing with eye-tracking showed it: when confronted with a new API, everybody searches for the documentation first. Then they look for live examples and code samples.
And like tests, the logical next step is the documentation equivalent of TDD: Documentation Driven Development. Start by writing the documentation, and then write the feature. Documenting the user-facing API before writing any code will let the best API emerge by itself.
Write drunk, edit sober
As developers, we spend more time fixing bugs and adding features than coding the initial skeleton. The same happen when writing documentation. Great documentation requires hundred of tweaks and rewriting and no-one ever did it right on the first try.
Writing and editing requires vastly different state of mind. Write a first draft to dump your ideas. Don't bother with typos nor grammatical errors, but write down all you want to say, to get a rough word count. Then, let it rest. A couple hours or even days, before editing it again.
Keep It Simple, Stupid
Define a shared Styleguide, with the voice and tone you want to keep consistently through your documentation. Your readers should not feel like they are reading a different author on each page.
Writing documentation is easy. Anybody can do it. What is hard is to write something that will be understood and remembered by the reader. The key is brievety and simplicity. Remove words and sentences until you think there is nothing left to remove. Then remove some more. And remember what someone famous once said: If I had more time, I would have written a shorter letter.
People will come to your pages from search engines. They won't read from top to bottom but can jump to any part of the page. They will scan the content, so help them identify what each section is about. Each of your paragraphs should explain exactly one idea and should explain it clearly (in perfect UNIX-style).
Tips'n'tricks
A good story ends with a satisfying finish, not in the middle of a cliffhanger. At the end of any page, list what has been learned, show what can be built with this knowledge or add links to the next steps.
Tools can make your life easier. They can even be pluggued to a ContinuousIntegrationservice. Don't waste time doing what a computer can do better and faster than you. Focus on where you bring value.
Spend time with your users. Immerse yourself into the support team and see the real issues your users are facing. Schedule regular user-testing sessions. They are an invaluable way to know the real issues that need documenting.
Add code samples because that's the first thing developers read. Add video tutorials for beginners and interactive jsfiddles for experienced users. Don't hesitate to add pictures to explain complex concepts.
All good writers are avid readers, so read. It will give you more words to enrich your vocabulary, so more ways to express nuances. This is even more true if you're not a native english speaker. Translating books into other languages is also a great way to improve your writing skills.
Conclusion
Even if not exactly what I was expecting, the event was a success. We all learned a lot, met interesting people, and even had the chance to pitch DocSearch. I think I will come again next year.
We will maybe even suggest a talk, because I feel that the way we write documentation at Algolia is on the right track, even if a bit special. We write the documentation of the feature we develop, and we also do the support for it. It puts us in a virtuous circle of feedback, bug fixing and documentation enhancing.
We like doing support, but we'd rather spend our time on adding new features. So enhancing the documentation and fixing bugs is our way to ensure that we spend less time on support and is a good motivation.
Thanks to all the organizers, speakers and attendees and hope to see you next year!
For the September session of the HumanTalks, Sfeir accepted to host us. Everybody was coming back from vacations (including us, organizers) but we managed to find 4 speakers and a place to host in a short amount of time. Tasty bagels and a nice terrace under the warm sky of September is a perfect way to start a new year.
FedEx days
First talk was about an experiment done at NanoCloud, something they call FedEx days. They did not invent the principle, and you should be able to find articles on the subject online.
The goal of FedEx days is to bring coworkers together in a timeboxed session to work on a specific subject of their choosing. They marked a day in their calendar, and one week before they asked everbody to suggest ideas of projects that could be built. Projects need to bring some value to the company, and need to be something you can actually build (no wild dreams here). Then everybody pick a suggest and form a team of about 4 people to work on it. Having no more 4 people in a team creates a climate of healthy competition.
In their iteration, most of the people chose a group based on their personal affinities with others, so there was no diversity in the teams. It's something they would like to change for the next event.
But what was nice was to bring together in the same team people that have been in the company for a long time as well as recent hires. It helped the new hires to better know their coworkers and the product. They plan to make it an "official" part of the hiring process for the next hires.
I'm personnaly unsure this is something I'd like to do on a regular basis. Or, let me reformulate. I think small team of friendly coworkers working on small features that bring value to the company should be what every day should look like. But actually immerse new hires in a team when they join, so they can know their coworkers better, have a deeper understanding of the product and bring something to the company seems a good idea.
ES Next Coverage
Second talk was way more technical, about a code coverage tool. Oleg, from Sfeir, is the developer of esnext-coverage. We saw how testing code is important, but also how having coverage of the tested code is an indicator of test quality.
Not everybody does testing, and even less people do coverage. Configuring the whole test/coverage stack is time-consuming and often discouraging. Oleg tried to make it as simple as possible with esnext-coverage so developers couldn't hide behind excuses.
The tool allow the exporting of the results in json or HTML as well as any custom formatter. This is an improvement over Istanbul that apparently have a limited list of built-in formatters. The default HTML formatter creates a single page application that you can use to browse through the code.
We then saw how all this works in practice, including a live-code of the instrumentation phase, where the initial code is transformed into another code that will "count" each time a line is executed. The whole demo was done in AST explorer, an online AST (Abstract Syntax Tree) console.
Basically every call to a method, or every variable assignment is replaced with a call to a method that will increment the count associated with the current line.
Interesting talk, I always like to see people write code that write codes. I like the meta aspect of it. Also, Oleg is a great speaker which made understanding the underlying concepts easier.
Taking the hard out of Hardware
Alex Bucknall from Sigfox tried to show us how hardware isn't actually so hard, and why it's actually closer to software than we could think.
It looks complicated because we cannot see it. But so is software, and concepts between hardware and OOP are often the same. Classes, Objects, Inheritance, all of that can be found in hardware as well.
Alex explained how the basics of hardware are actually simple, and how more complicated pieces are built out of simpler blocks and then packaged under a new name. And even more complex blocks are built out of those previous ones, and so on.
Everything looked so easy when Alex explained it, but still it felt too fuzzy to me. There are still concepts and words I don't get (frequency, voltage, etc) that I had trouble following.
If anything, it nicely introduced what SigFox is doing, namely hardware blocks that can communicate between each other through familiar HTTP/API/callbacks mechanisms.
What is a stand-up meeting?
Last talk of the day was about the stand-up meeting ritual, dear to the Agile methodologies, presented by Thibaut Cheymol.
Stand-up meetings must answer three specific questions: What did I do yesterday? What will I do today? What issues did I face? All three questions must be answered without forgetting the current sprint goal.
If done like it should, the stand-up meeting will boost the motivation of the team, letting everyone having a clear picture of what they are going to do today, and more importantly: why.
But we all know that the stand-up meeting can turn into a reporting game where everybody waits for their turn to speak and report in a military fashion what they did the day before.
The advice given by Thibaut against that were to:
Actually stand up. No chairs, no leaning against the wall, no computer.
15mn tops. More than that and it's boring
Start on time. Be concise. No discussion, no interruption.
Be prepared (~5mn the day before)
Other more generic advices were to always talk about what was done, not what is in progress. Trying to talk about the end goal ("I made sure that people could buy the product"), instead of the actions done to achieve it ("I implemented the API responses").
Not everything goes as planned, we often have issues that need to be dealt with in the moment, delaying us from what we planned on doing today. In that case, we need to fix issues when they arise, but talk about them on the next stand-up meeting. To avoid having discussions and/or people talking for too long, Thibaut suggested the following three-turns approach:
Everybody, in turn, tell what the issues were
Everybody, in turn, tell what actions they did to fix the issues
Everybody, in turn, expose the results of the actions taken the day before
I haven't been doing a single stand-up meeting in the past year and a half, and I don't miss it at all. I did a lot of them in my previous job, but never perceived it as valuable tool (because we fell into the classical traps, maybe). I'll keep those advice in mind if I ever have to do another stand-up meeting again.
Conclusion
Thanks again to Sfeir and all the speakers. Next month we'll be at Prestashop, with 4 new talks. Hope to see you there.
After dotSecurity last Friday, it is now time to go to dotScale. This follows the same pattern as the dotCSS/dotJS conferences. One short (one afternoon) on the Friday and one longer (one full day) on the next Monday.
As usual, this took place in the Théatre de Paris. The main issue of the last dotJS that took place here (the main room getting hotter and hotter along the day) was fixed, which let us completely enjoy the talks. There were more than 800 attendees this time.
Mickael Rémond
First talk was by Mickael Rémond, about the Erlang language. It was a high level talk, explaining the history of the language.
It started in 1973, in big telecommunication firms. It follows what is called the Actor Model. Each actor has a single responsibility and can send messages to other actors. It can also create new actors and process its list of incoming message in a sequential order, one at a time. Each actor is an independent entity and they share nothing.
By definition they are scalable, because they all do the same thing without being tied to any specific state or data. You can distribute them on one machine, or across several machines in a cluster.
In 1986, Erlang was officially born. It added a second principle of "let it crash". As actors have one responsibility, then do not have to bother about handling errors. If it fails, it crashes, and the whole language embraces that. Other actors can be added on top of the first to handle the errors, but the main responsibility of each actor does not care about errors.
More than ten years later, in 1998, Erlang was finally released as open-source. The most interesting features of Erlang are not the language itself, but all the primitives of it that are implemented by the Erlang VM. Today new languages are born on top of Erlang, that uses the same VM, like Elixir (done by the core contributors of Ruby on Rails, it is gaining some popularity lately).
Overall the talk was a nice way to start the day. Even if not directly a story of scaling, it was great generic knowledge.
Vasia Kalavri
Next talk continued on that generic knowledge pattern, with Vasia Kalavri who told us more about graph databases. I didn't really see the link with scaling here either, I must say. The slides are available here.
In graph databases, relationships between objects are first class citizen, even more than the objects themselves. You can run machine learning algorithm on those relationships, and infer recommendation, "who to follow"-like features. Even search in a graph database is nothing more than a shortest path algorithm. Even the Panama Paper leaks was mapped using a graph database.
When we start talking about big data, we often think about distributed graph processing. Vasia was here to tell us that distributed processing was not always the right answer for graph databases. Because we are mapping relationships between objects, the size of the dataset to map does not have a direct correlation with the final size of the data in memory. You might think that with your TB of user data you might need a huge cluster to compute everything. Actually, to process its "who to follow" feature, Twitter only needs 64GB of RAM. And that's more than 2 billion users. When you start thinking you'll need a distributed cluster to host all your data, please think again. The chances that you have more users than Twitter are actually quite low.
If storage space is not a relevant argument, maybe speed is? You might think that using a cluster of several nodes will be faster than using one single node. Actually, this is not true either. The biggest challenge in building graph databases is cleaning the input. You will aggregate input from various sources, in various formats, at various intervals. First thing you have to do is clean it and process it, and this is what will take the longest time. When you think of it, your data is already kind of distributed across all your sources. Just process/clean it where it is before sending it to your graph database.
Then the talk went deeper into the various applications of graph databases, like iterative value propagation, which is the algorithm used for calculating things like Google PageRank. You basically compute a metric on each node and aggregate it with the same metric from the neighbor nodes.
It then went into specifics of Flink, Sparks and Storm but I didn't really manage to follow here, except that the strength of Flink was its decoupling of streaming and batch operation while Spark does streaming through micro-batches.
Not really what I was expecting from dotScale here, but why not.
Sandeepan Banerjee
After that the ex-head of data of Google, Sandeepan Banerjee, talked about containers and how they deal with state.
His point was what we've all heard for the past two years. How Docker is awesome, and how Docker still does not answer the question of stateful data. How do you handle your dependencies in a container? How do you handle logging from a container? How do you run health checks on containers? Where do you save your container database?
As long as your container only compute data coming to it, and sending it back, you're fine. But as soon as you have to store anything anywhere, your container starts to be linked to its surroundings. You cannot move it
When an issue is raised in production, it would be nice to be able to get the container back with its data to your dev environment and run your usual debugging tools on it, with the production data. It also raises the question of integration testing of containers. How do you test that your container is working correctly, when it is talking to other micro-services?
In a perfect world, you would be able to move a container with its data from one cloud provider to another, as easily as if there was no data attached. You could move it back to local dev as well and share the data with other teams so each can debug the issue in their service, with the same data from production. But we're not in an ideal world.
He then gave his solution, which still does not have a name, of a way to organise containers by grouping them and moving the whole group at once. Even if each internal container will be stateless, the group itself will have knowledge of the associated data and relationships between containers, thus becoming stateful.
This solution would be based on a git-like approach, where the snapshots would not only contain relationships and data, but would also include a time dimension. This would allow you, just like in git, to get back in time and reload the snapshot like it was two days ago, and share a unique reference to that moment in time with other teams. He also talked about using the same semantics of branches and merges across snapshots.
He went even further talking about a GitHub-like service where you could push your container snapshots and share them, versioning your infra as you go and integrate it into CI tools to deploy a whole network of containers with a simple commit.
Overall this looked like a very promising idea, even if we are really far from it. A few other talks during the day mentioned the Git approach of branches and unique commit identifier as a way to solve a lot of issues. I hope this project will be released one day because if it actually fixes the issues that were raised, it will greatly help. Still, today it felt more like a wish list than anything real.
Oliver Keeble
Next talk was one of the most interesting of the day, one that makes you put in perspective your own scalability issues. It was about the Large Hadron Collider of the CERN, and how they manage to handle all the data they got from the experiments.
The LHC is the world's largest particle accelerator, it takes the form of a 27km long underground tunnel where they collide particles at the speed of light, and analyse the results. It's an international scientific collaboration. When the collision happens, things happen, things disappear, and the only things that are left are the data the machines where able to capture.
Because such an experiment is a huge scale, they need to capture as much data as possible per experiment. They have more than 700 tons of detection devices, that record data at a rate of 40 millions times per seconds. That is a lot of data to capture, send and process.
Because there is so much data, they know they won't be able to capture all of it, so they have to make estimates based on what they captured, and extrapolate from that. To do so, they have to calibrate the machines with empty tests, to see the amount of data they manage to capture from a known experiment. Then, it's just drawing a statistical picture for when they will do it with more data.
In addition to data that can fail to be recorded, data transmission can also fail. Sending so much data from the source of the experiment to the machines that will process it will obviously incur some data loss in the transfer. To counter that, they have a validation mechanism to check that the received data is not corrupted. It makes the transfer slower, but is needed because every piece of information counts.
They also have a dynamic throttling of the data sent. They analyze in real time the percentage of success of such checksum validation, and if the percentage improves, they will push more data. If it decreases, they will push less, which should dynamically alter the throughput to be the most efficient. As the speaker said:
You'll always be disappointed by your infrastructure, then better to embrace it.
Because of its international structure, the data analysis of the data is done in parallel in every parts of the globe, along what they call the grid. Machines in all lab research facility can help computing the data. They just register to a queue of available machines, announcing their current load. The CERN on the other end will submit jobs to another queue and attribute each job to a matching machine. Once the job is finished on the machine, it will just call home with the results and final results of all jobs are merged into a new queue.
It takes 25 years to build such a collider, and they are building a new one, a 100km ring around Geneva, so they expect to catch 10 times more data in the future.
That was the kind of talk I like to see at dotScale. Real world use cases of crazy scalability issues and how to solve them.
Lightning talks
After the lunch break was time for the lightning talks. The overall quality was much better than at dotSecurity.
Datadog
First talk was by a guy from Datadog, but he never advertised its company directly. Instead he used the same colors as the company logo, and talked about a dating website for dogs. You know, a place to date a dog… It was not completely obvious, so fair enough :)
His talk was about the need to monitor everything, not only some key metrics, because you never really know what metrics you should be careful about. By logging more and more data, you can correlate parts of your infra/business together. The more you monitor, the more you get to know your data and you can refine non-stop what you should be looking at.
Their pricing model is based on how many metrics you keep track of, so I'm not sure how biased his speech was, but the idea is interesting nonetheless.
CRDTs
Then Dan Brown came again (he was here the year before as well) to talk about CRDT, or Conflict-free Replicated Data Types. This gives a set of guidelines to know how to reconcile conflicts across two different versions of a data in an eventually consistent architecture.
By defining a standard set of rules, conflictual merges can be reasoned about with the same outcome by everybody. When you have to changes on a boolean value, let's use true by default. When changing the value of a counter, let's always use the most positive (or negative) value. When changing the value of a simple key/value pair, let's say that the last write wins. For arrays, if you happen to have several conflicting add/remove jobs, give priority to the additions.
This will not avoid conflicts, but will give a commonly known set of rules to automatically handle them, without requiring manual interaction.
Scaling bitcoins
Then was Gilles Cadignan, about how to make a proof of existence using bitcoins. He started with the assumption that everybody knew what bitcoin is and roughly how it works. That was my case, I still only roughly get how it works.
In a nutshell, he explained how to embed some document hashes into the bitcoin chain to prove that a document existed at a given date. Not sure I can give more explanation than that, sorry.
Monorepos and many repos
Then, Fabien Potencier from Symfony explained how the monorepo organization of Symfony is working. They currently have 43 projects in the same monorepo. They use this repo for development, not for deployment. They have scripts to split this big monorepo into individual repositories.
They still actually use both one big monorepo and several small many repos. I'm not sure I get the story completely straight, because when I talked about it with other attendees, we all understood something different. But what I get was that the monorepo is the source of truth, this is where all the code related to the Symfony project is hosted. It makes synchronizing dependencies and running tests much easier as everything is in the same place.
But they still automatically split this main monorepo into as many repos as there are projects. Those projects actually reference the same commit id as the one in the main monorepo. By doing so they still keep a logical link between commits in two repos, but they can give specific ACL to people on specific parts of the project.
What I did not understand is how you contribute. Do you submit a PR to the main monorepo and it will trickle down to the simple repo, or do you submit a PR to the small repo and it will then in turn submit a PR to the main one?
Juan Benet
After that, it was time to get back to the real talks. The next one was by Juan Benet, about IPFS, the InterPlanetary File System. One of the most awesome and inspiring talks of the day. To be honest, there was so much awesome content in it that it could have lasted the whole day.
He started by reminding us how bad the whole concept of centralizing data is. Distributed data is not enough, you just create local copies of your data, but they still all depend on a central point. What we need is decentralized. We have more and more connected devices in our lives, but still, it is not possible, as of today, to easily connect two phones together, directly. We still have to pass through external servers to send files. Considering the machine power of the devices we have in our pockets, this is insane. Those devices should be able to speak to each other without requiring a third party.
Centralization is also a security issue. Even if the connection is encrypted, the data stored on the machine is not. It has to be stored in plain somewhere, which means that it can be accessed, copied or altered.
Most of the time, we cannot directly access the data. We always have to use a specific application to access its data. There is no way to directly connect to a DB of a distant server for example, we have to go through their API (when it exists).
Webpages are getting bigger and bigger, but we do not really care because we have faster and faster data connection. Except when we don't. When we are in the countryside or in the subway for example. Or living in a country where the data connection infrastructure is not as good. And we're not only talking about third world countries here, but every country after a natural disaster such as an earthquake. And it's in those very specific moments that you do need to have your means of communication working. But there are also the human disasters to take into account, the government oppression, the freedom of speech to defend...
In our current web, URLs are not eternal. Links breaks all the time because website evolve, or because url shorteners are no longer maintained.
Well, that's a huge list of problems that I can only agree with. All of this comes from the same root cause, that all the online resources are linked to an address. Internet is an awesome pool of knowledge, but it is still very awkward and old-fashion in certain aspects. 50 years ago if I were to tell you "Oh, I read this book, it is awesome, you should read it too", you just had to go to the closest library and borrow the book. Today with the way internet is working you would have to go to the very specific library I'll point to you, even if the same content could have been found somewhere else. We are not pointing at content anymore, but at addresses to find that content.
It's a big problem, and there is a way to fix this, that draw its inspiration from git. SVN was centralized, and it exhibited all the issues I talked about earlier. Git and its decentralized model fixed it all.
IPFS would be like git, but for content. Each object would be linked to another through crypto hashes, like commit hashes, and that way ensure the integrity of the whole. You couldn't just change something in the chain of commits without everybody knowing. And when you would talk about a specific file, through its hash, everybody would know what you are talking about. Because it would be distributed, there would not be any central source to get data from, you could get it directly from any of your peers that have it as well.
There is already a lot of P2P systems that works through this logic, like bitcoins. OpenBazaar is an eBay-like, distributed and using bitcoins as a currency. The beauty of such a system is that you can still add some central nodes, like GitHub does for git projects, but they won't become SPOF, they are just here for performance and convenience.
libp2p is a library that integrates a lot of existing P2P protocols to build this IPFS. All of them are compatible and allow sharing of documents through a forest of Merkle trees from various compatibles origins.
This idea of using a git structure for various usages was discussed by a few speakers, but this presentation really went deeper into it. Two years ago everybody was talking about Docker in their presentations like it will be the next big thing, I really hope that IPFS will be a reality in two years. This will just be one of the greatest improvement in the worlds of data sharing, hosting and security.
Such networks are already in place for some blogs, web apps and scientific papers. Following the talk of Diogo Monica at dotSecurity, such a system will also be of great value for package managers.
Such an awesome talk really, this is my personal highlight of the whole day.
Greg Lindhal
Next talk was also one of my highlights. It was from Greg Lindhal, and explained how the Internet Archive works, from the inside.
The Internet Archive is a non profit library that aims to store everything that is produced on the internet. It contains more than 2 million videos, books and audio recording, more than 3 million hours of television and 478 billions of website captures. Overall, this weight around 25 petabytes of data.
One of their main focus is on not losing any data, ever. They store the data on two geographic locations, and store it on two different physical disks on those locations. They extract metadata from the documents and have a centralized server that contains only the metadata and information about where to find the complete file.
Metadata information is also stored alongside the data itself, so if the centralized server gets corrupted, its content can still be reconstructed from the raw data.
They get their initial data from various crawlers, some are their owns, other are from external sources, either professional (like Alexa) or volunteers.
They have a lot of what they call derived data. From a given source, they can run OCR, extract the transcript, or other kind of data. To do so, they need to operate on a copy of the original data, to avoid corrupting the data in case something goes wrong.
They currently have issues with their search, which is not very efficient. They do manually build an index of all their resources in a text file of 60TB compressed. It is so big they have to build an index of the index itself.
In the future they would like to do a partnership with major browsers, that could provide a fallback on a previous version of a website, in case of a 404 error. But before being able to provide that, they need to figure out how to scale the lookup time.
They are currently operating on the maximum capacity. They plan to move on SSD, but this is still too expensive. They are also investigating migrating to a NoSQL database to replace their manual text index file. It currently takes 100 days, just to index content. But this is an old project, with hacks built on top of hacks built on top of hacks. Changing the stack means backporting all the (undocumented) hacks. Technical debt is not about technology, it's about people, and how they used the technology along the years.
As the speaker was saying, all of their projects, even the smallest one can be considered big data. They tried to move to ElasticSearch, but 100GB of their data exponentially explode to 12TB once put into ES.
Surprisingly, they do not receive that many take down requests for copyright infringement, but this is mainly because their search is so bad, that it is nearly impossible to find something. I especially like this quote by the speaker about that:
First thing you do when building a website search engine with ElasticSearch is to replace the default ES algorithm.
Really nice overview of how this massive project operates. The amount of data stored is extraordinary, and I've always wondered how a non-profit managed to handle this. The answer is: they struggle.
Sean Owen
Then it was a talk from Sean Owen, from Cloudera. He mainly talked about machine learning in the context of content recommendation for music genres. It went quickly too technical for me to follow the main idea.
He told that what started as a machine learning project, quickly became a big data project. They realized that to build a relevant content recommendation system, they will need data from their users. They use Spark to handle the processing of the data, because the technology is apparently well suited for scaling this kind of recommendation and machine learning issue.
As I said, he quickly started giving a mathematical explanation of how to compute two matrix of music genres, with specific tips and tricks to make it faster to compute. I cannot remember the exact tricks of this specific use-case, except that he suggested to always use the native math libs of Sparks instead of the JVM for anything math-heavy related.
So yeah, sorry this recap is not as thorough as the others, that's really all I can say about it.
Spencer Kimball
The next talk was a joke. Spencer Kimball was supposed to tell us more about CockroachDB, but he seemed like he had a plane to catch. He did in less than 20 minutes what was supposed to be a 1 hour long talk.
We barely had time to read a slide before he jumped to the next. The only thing I grasped from his talk was that he knew what he was talking about. Good for him, because honestly, I think he was the only one.
Gleb Budman
Fortunately, the next talk was its exact opposite and was one of my favorites from the day. Gleb Budman, from BackBlaze gave an overview of how they built BackBlaze from the first days.
BackBlaze is a cloud backup system. They offer unlimited backup for 5$/month (unfortunately not compatible with Linux, otherwise I would already be a customer). What is really interesting is their approach into building such a system, in an iterative process.
They first had to choose where to store the data. They couldn't go to Amazon because for 5$ a month, they could only get 30GB, and they needed to offer infinite storage space, because you never know how much data you'll have to save. And you want a backup system that "just works", without you worrying about disk space.
So they started to investigate into buying their own hardware. They quickly realized that they could buy a 1TB drive for 100$ in a local shop, while it would cost more than 1000$ for the same capacity for its server-side version.
They never imagined that they would start building their own hardware. Amazon already does that, and there was no way they could be more cost efficient than Amazon. But it was either trying, or closing the company. So they tried.
They started with external drives and cascading USB hubs. Didn't work.
So they started thinking about what they do not need. They are only storing data, they do not need to run application on their hardware. The data will only be accessible from time to time, not continuously. The various machines will not need to be able to talk to each other directly.
But they do need to be cost and space efficient. They need to be able to store as much data as possible in the smallest possible space, without being too expensive. So they bought a bulk of hard drives. Not high quality ones, but commodity parts. They created a custom case, made out of wood, to host as many drives as possible.
This worked well, but didn't scale really well. They will need more and more of those cases if they want to grow. So they decided to do some Agile, but for hardware. They contacted a company that could build a metal case from the wood prototype. And in small iterations of 4 weeks, they manage to learn a lot about what they needed, by testing it constantly, and improving each case from the knowledge they got from the previous case. They tested different shops, different 3D printing techniques, and regularly improved the hard drive case.
But then developers started to ask for an API, to directly access the data. So they needed to keep iterating on the case, but now to hold hardware that could host both the data and run a webserver.
They managed to fit 60 hard drive in a box, or what they call a pod. Because it's commodity hardware, it will fail. So they replicate the data across the drives. For each chunk of 20 drives, they can afford 3 of them down at any time. The data is replicated enough so that they could rebuild the missing pieces as long as 17 drives are working.
He then gave more details about some real life issues they had when building the pods. Like forgetting to put a hole to set a power button, or putting a coat of paint that was too thick and prevented the case to fit in the datacenter. Or how a flood in Thailand stopped the worldwide production of hard drives and how they had to manually go buy external drives and rip them out to get the hard drives to put in the pods.
Overall, his talk was really inspiring. Telling us that there is nothing more important than field knowledge. Go test your product on the field as soon as possible. It can work with hardware as well as with software. Find ways to build, even if not perfect, you will learn and you will improve it over time. Toss the unnecessary, don't buy things that are too expensive and provide features you don't need. Create prototypes fast, and often.
Ted Dunning
Then Ted Dunning talked about real time processing, and how streams are the future. The talk seemed really meta and high level and hard to follow.
He started by saying that everything in the world works in a symmetric way, with an associated conservative law, like energy and time. And that this is true both in the microscopic and macroscopic world. He then throws a few dollar bills on the floor, telling us that if we had to pick those bills, the most clever way would be to pick the highest value bills first. I told you it was meta.
The idea behind the bills show is that, if bytes were money, we should pick the best bytes first. A data processing system should first take the more interesting data, and then continue by picking the less interesting data. This would create a graph were the value we get from the data is high at first, and then just hit a plateau where we got less and less value per data. The flatter the graph goes, the more expensive it gets to get value.
He then went on talking about the fact that data should never be altered. That data should only be reasoned about in terms of now, after and before. That all processing of data is actually a stream, and that data will just keep growing and that scaling is inevitable.
He also went on talking about the communication cost of developers. In a perfect world, putting more developer on a task should make this task faster to resolve. In practice, this adds a communication cost. There is an optimum number of developers for a given task where the boost in productivity outweighs the cost of the communication. Each company, and even each project, should find this optimum.
He concluded by saying that even if REST is nice, and that you can technically do anything with it, it still creates a communication cost in between all the nodes that needs to communicate. On the other hand, streaming (through something like Kafka), removes the need for coordination, or acknowledgement, and will win this part in the future.
To be honest, I had a hard time following where he wanted to bring us with this talk. I am still not sure what is point was.
Eliot Horowitz
The next talk was easier to follow, even if I didn't really share the excitement of the speaker. Eliot Horowitz is the co-founder and CTO of MongoDB.
He talked about how we all already have microservices in the form of third parties in our infra. We have analytics in GA, project management in JIRA, various tools for the NPS, the emailing and a lot of other applications.
The issue with that is how to keep the data consistent across all those services, and more importantly how to search into it all at once. Each service has its own part of the data, and see it through his own system. It is very hard to get a big picture of it all, and link data together.
You could build a regular report by fetching data from all sources, aggregate them and output a coherent set of values. This would take some time and money, and will be fixed in time. This is not the way to search and explore your data.
You could build a data warehouse, where you put all the structured data you got from building the reports. This would let you search into it. But every time you will add a new source or change the schema of the data, this will create inconsistencies in your warehouse.
On the other hand, you could build a data lake, where you put everything, with various schemas. Unfortunately, the lake will quickly become a dump. You're actually just building a drowned warehouse.
So let's start again, and let's not assume you need to put all the data in the same place. By keeping the data where it is, you offer more flexibility to the usages. You can add new services that will handle new type of data.
Mongo Aggregate is the service he wanted to tell us about. It is a new pipeline in mongoDB that lets you query data from various (external) sources. It is built to let you link several cohorts. For example knowing if the users that did something in system A, are also the same that did something else in system B. For example, are the users that read the release note the same that give a high NPS?
To do so, mongo has to actually query this external services. A lot. But because the whole process of getting a subset of users from one system and checking in another system if they share a common attribute is wrapped into mongo, they can optimize the query on several levels. How? I don't know, I think he only mentioned caching. I wonder how this handle third party failures.
I'm always highly skeptical about tools that are branded as "you can anything with it, we just take care of everything". Connection to third parties API is trivial, except when it fails. And it fails a lot, so having all this hidden inside mongoDB does not seem an attractive feature to me.
Eric Brewer
Last talk of the day was by Eric Brewer, VP of Infrastructure at Google. He talked about immutable containers.
For him, we should really not focus on the containers, but on the services they are running and exposing. We should not care about which machine is hosting a container, we should just care about the service it exposes.
He discusses grouping containers into pods, that also share a data volume. All containers of a pod are collocated on the same machine, so each knows about the IP address and exposed ports of the others.
You then start to build services on top of pods. The pods are stateless, they are self-containers, even if their internal containers are stateful. Now, you can just duplicate the pods and put a load-balancer in front of them. Pods are interchangeable, because they are immutable. I could just restart one at any time. And I name them based on the service they host, not their inner components.
If we go one level higher, a network of pods should also be immutable because each of its parts is immutable. Because of this, you could build your own infra and test that it works correctly before actually deploying it. When you build, you actually just build a graph of all your pods together.
This all looks well on paper, but I often hear that immutability is the answer to all software problems, like asynchronous code was a few years ago, so this just looks a bit too magical.
Anyway, he finished with a demo of how he could do a hot version update on hundred of pods without downtime, by replacing each non-used pod one after the other and replacing it with one with the new version.
He still finished on an interesting note about hard drive and datacenter reliability. In the future, we will store less and less data on our laptops and more and more data in the cloud. Drives in datacenters will become the main storage point, and those are already pretty reliable today. They are even often more reliable than the rest of the structure of the datacenter. So there will be no need to further improve their reliability; instead we should reduce their size, make them cheaper, make them use less energy. In the end they only need to be as reliable as the datacenter that is holding them.
Conclusion
And that's it, the end of the day. This was a pretty long day, especially considering that we also had the dotSecurity the Friday before. Next year, there will even have a dotAI on the day after. Not sure I'll be able to do the three days in a row.
The main subjects were about containers and how they handle state. How going for fully stateless is not actually possible and containers should embrace their data, through pods or other grouping pattern. Merkle trees and git-like structure where also a hot topic, and one that got my attention.
Overall I have the same feeling than after dotJS. Lots of talks, but only a few really inspiring. Those talks are awesome, don't get me wrong, and they make the conference worth it, but the ratio is less interesting than at dotSecurity for example. I think I will come back next year, but not sure I'll go to dotAI.
First session of the dotSecurity conference this year. I've been to a lot of dotConferences in the past years, and they always took place in the most gorgeous venues. This one was no exception. As usual, their is also no wifi in the theater, and attendees are invited to not use their laptops.
Obviously, I did not follow that advice, otherwise you couldn't have been able to read this short recap.
Making HTTPS accessible
The first talk of the day was by Joona Hoikkala, about https, and the hassle it is to properly install and configure it to a server. The overhead is getting so low and the benefits so high in term of security and privacy, that the whole web should be https. It protects end-users from session hijacking, content injection and arbitrary censorship.
And yes, I get the irony that, at the time of writing, this very own website is not served through https. I'll fix that soon.
The other advantages are that http/2 will only work with https. Well, according to the specs, it could also run without it. But in practice, no current implementation handle this case, and all need https enabled. It is also said that it should give better SEO results, but as with all things SEO-related, I wouldn't put much faith in that.
Still, the issue today (and that will be my excuse for not having https on this very own website) is that it is hard and cumbersome (and expensive!) to get a certificate for each server we own. We could default to a wildcard server (something like *.pixelastic.com), but this will just lower the overall security because if one server got compromised, all are insecure.
The solution today is to use Let's Encrypt that gives easy and free certificates for everybody. It comes with a command line interface and ease the burden of creating, revoking and updating certificates. There are more than 2 millions certificates issued through Let's Encrypt since its creation.
Overall, I must say that this first talk was a bit slow to start. I took me a while to understand that the speaker actually worked for Let's Encrypt. If anything, it gives me one more nudge to force me to move my servers to https.
Life of a vulnerability
The second talk was by Filippo Valsorda, from Cloudflare. He explained in very simple terms how vulnerabilities (or vulns) are discovered, published and fixed. Really nice overview for something that I only hear from a distance without knowing how the process really works.
He started by defining a vuln by being basically a state of software that lets user do something that they are not allowed to do. Most common forms include DDOS, leaks of data, SQL injections or remote code execution.
Not all vulns have a nice name, website, logo and stickers. Most of them don't, actually. When you discover a vulnerability, you have to report it. There are two ways to report them: full disclosure and responsible disclosure.
For full disclosure, you post the vulnerability to a public forum, letting everybody know about it. Such forums are often target of legal threats of companies that don't want their vulnerabilities publicly accessible. Other people can choose to sell them. Who actually buy this vulns is still a bit obscure for me, but it can be people buying them to resell them later, or use them for their own profit (either governments or mafias).
For responsible disclosure (also known as coordinated disclosure), you do not post publicly, but contact the website/owner of the service being impacted. You tell them about the issue, the risks and how to fix it. This gives them the time to fix the issue before it does too much damage. That is why it is very important on any website or open-source software to have a way to contact the team for security reason. It can be a simple security@domain.com email address, but it shows that the team will take the matter seriously.
Also, being publicly acknowledged has having found a vuln on major websites like Google or Facebook is very nice for exposure and building a reputation of security expert.
Once the vuln is identified, it should be assigned a unique number so everybody knows they are talking about the same issue, and can work in a coordinated effort to fix it. There are hundreds of vulnerabilities issued per month, some will target only a very specific version of a very specific framework while other can have a much bigger impact.
Such identifies vulns are publicly announced with the list of affected version of the software, the explanation of the issue and the real world impact of such issue. Alongside are displayed patches and/or possible fixes as well as credits.
Then, he ended his presentation with something I found a bit weird to do in such a conference. He did a live exploit of a known vulnerability on an old version of Ruby on Rails to allow remote code execution. He used MetaSploit and Shodan to perfectly show how easy it is to find an insecure machine and run a known exploit to it.
Sure, the machine he targeted was his own, but it wasn't really clear at first. I don't think it was such a good idea to show the audience how easy it is to get access to distant machine and start breaking them, without telling them about the legal implication of doing (like it could get you in jail in some countries). I would have appreciated a bit more context here.
Still, very good talk that taught me a lot.
Multi-factor authentication
Following talk was by Jacob Kaplan-Moss. He told us everything about multi-factor authentication (which is a fancy word to say two-factor authentication, I didn't get the difference).
Login to a website simply through a password is no longer enough. We need a secondary channel to confirm the person doing the action is actually the one we think it is. Computers with password saved in cache can be stolen, or left unlocked.
For 2FA, we need another object that is owned by the user trying to do a sensitive action. Most of the time it is mobile phone with a 2FA application that display a small code that you have to input on the website, but there are other ways.
You could send a SMS to the user, or give him an automated phone call with a special number he need to input. This is called out of band communication, you request information from the user in channel A that was send to him through channel B. This way you make sure you're not discussing with a fake user impersonating the real one because he stole his laptop.
There are two type of tokens you can exchange that way. Soft tokens are the own that are generated through 2FA Apps like Google Authenticator or Authy. Hard tokens are physical objects like YubiKeys.
All those solutions offer various degrees of risks, UX and cost. When choosing a solution, you should not only focus on the risks. If you take only the less risky solution you often end-up with the most miserable of UX and users will in the end not use your ultra-secure solution because it just too damn hard to use.
When assessing risk, you have to check if the token can be intercepted, if they can be brute-forced, if you can notice if a token has been stolen. You also have to check how the apps or hardware is protected against malware.
When assessing costs, you have to take into account both the cost for the company (how much does it cost to give YubiKeys to everybody?) as well as the cost for the end-user (will it be cumbersome, will they lose time doing it, etc.).
For example an out of band communication through SMS is easy. Most of your users already have a phone, and they know how to use SMS. It is also quite easy to intercept SMS, and such a communication channel could easily fail in countries where the phone network coverage is bad.
For soft tokens through authentication apps, it is much harder to compromise tokens (even if it possible), but you don't need a network as it can work offline. Still, there is a UX cost because you need to download an external app and manually input the code from your phone.
For hard tokens, the real threat is that the master key that can validate the tokens for all physical keys can be stolen. There is no risk-free solution, and even if such an event is unlikely, if this happens, you need to build new keys for everybody and ship them to wherever they are in the world. Cost-wise, it can become really expensive. But in term of UX, this is the must, you just have to press a button to prove that you are you.
Whatever the method you chose, you have to ask for 2FA not only when they login, but whenever they request a sensitive action. This could be adding new members on a team, changing passwords or billing information. You should also monitor for weird behavior, like changes in connecting devices or geo-location.
You also need a backup plan. What happens when people lose their phone or keys. How can they connect to the service again? There is no perfect answer for that question. You could provide user with backup recovery code, but then the user must store them securely somewhere which just moves the burden of security to the average users. Most won't store them at all.
You could allow for a backup phone, but this only increase the attack surface, making everything less secure. You could allow the support team to help, but they will then become open to social engineering. Or you can simply state that there is no way to recover if you lost your authentication method. Whatever method you chose, you have to tell you users upfront or they will be really pissed off when they discover they have no way to get their account back.
The speaker suggestion was the following setup, that differentiate between public accounts and internal accounts:
For public accounts, 2FA apps are the best trade-off in terms of UX and security. Authy being better than Google Authenticator UX-wise, but more costly for the company. You also have to request the token on each sensitive action. For recovery, don't allow support to recover an account, and provide backup codes.
For internal accounts, you can greatly improve both security and UX by putting a bit more money on the table and giving a YubiKey to each employee for 2FA. This will prevent outsiders from taking over an employee account. Add behavior analysis, like detecting when a connection is not done from the known workplaces, and trigger a 2FA check. Also, only allow 2FA reset when asked face to face to the security team.
I really liked a quote of the speaker during the short Q&A session that came afterwards:
Security should not be handled by a security team. It should be for the day to day developer. Just like tests should no longer be in the hand of testing teams but embedded into development through things like TDD.
Lightning talks
After those talks and a break, we continued with a small commercial break, aka. Lightning talks.
The first one presented in the form of a git branch workflow what he called the classical "security audit workflow". You request a security audit on your code, so the security team creates a new security branch to start working on it. During that time, the main develop branch keeps growing because devs don't stop coding during the audit. Then it's time for the release, so we merge develop into release. But the security audit wasn't finished, so no security fixed was pushed to production. And once the security audit is done, the code on develop has changed so much that the audit is useless.
His solution to that was to do runtime checking of the code, through his company called Sqreen.
From my point of view, the whole premises where fucked up. He was giving a solution to the wrong problem. Why would you even start a security audit if you're not going to implement the results before shipping? This is the root cause of the issue. Either integrate security testing in the flow process, fix issues before release or open a public bounty program. There are so many other ways to avoid the absurd previous branching model shown on screen.
I'm not saying Sqreen product is bad (we even used it at some point), but that what it was trying to solve was not a tech issue but an organizational issue.
Then Ori did some Ori on stage, showing dead squirrels and demonstrating a slack bot to publicly shame sudo users.
Finally a guy from OpenCredo tried to explain something, but as he was constantly looking at the slides behind him, I couldn't hear what he was saying.
I'm a bit disappointed by the lightning talks of this edition. This looks too much like the bad sessions of the ParisTechTalk meetup, where you have talks that are only an excuse to sell a product or company. This is especially surprising because all the other talks of all the other dotConferences takes great care to avoid talking about their companies.
Content Security Policy
We then came back to the real talks with Scott Helme and the CSP (Content Security Policy). I discovered the CSP last October at ParisWeb, so nothing really new here. I still think it's something I should use more and it got me thinking about so many usages.
But first, let's explain what that is. Basically it's a new header (content-security-policy), that you add to your responses. Its value is a long string.
It will tell the browser what it can or can't do regarding loading of external sources. And the browser support is actually quite good already. You can tell the browser to only load assets from your specific CDN as well as your main domain, to protect you from XSS. You can fine-tune it to apply different rules for images, videos, scripts, styles, etc. You can prevent your page from being embedded in an iframe, etc.
By default it will block the execution of all inline script in your page, forcing you to load them from an external file that should be hosted on one of the servers in the whitelist. Another cool feature is to automatically disable the loading of http assets on an https page, avoiding the warning about mixed content. You can even force the browser to always upgrade http urls to https.
But the two most interesting meta features are that you can toggle a report mode that will not block anything, but only display warning in the console. This lets you test it to see what will break before really deploying it. The other feature is a way to send those reports to an external server instead of displaying them in the console, for later analysis.
I wonder how easy it is to test those CSP with statically generated websites (the Middleman/Jekyll type). I'm also wondering how much we could exploit this mechanism to send detailed information about the user, directly from the browser, without any third-party tracking script.
I mean, if I turn on report mode only and disable loading of all assets, I will receive the report of all the assets the user wanted to load. Doing so I'll be able to analyze for example which users have an adblocker (because they didn't got "blocked" while trying to load an ad). Could I also load a script from a valid source that will dynamically inject a custom script with all the user data (viewport, ip, user-agent, etc.) in its url to log it?
I'm sure there are really interesting data to get from this feature, as well as increase the overall security of apps of course!
Diogo Monica
The next talk was my personal favorite of the whole afternoon. It was made by Diogo Monica, head of security at Docker. I did not get everything from the talk to be honest, but it was explained in such a clear and concise way that I really liked it.
At its core, Diogo said that updates were one of the most, if not the most, important parts of security. When an issue is found and fixed, you have to update your software to actually enjoy this fix. And how do you update? Through your package manager.
So package manager are actually a very important part of the security chain, and any security issue in the package manager itself can compromise any package it handles.
All package managers today download updates through a secure connection. This offer great protection against man-in-the-middle (MITM) attacks, but is not enough. If the source is compromised, you are still downloading an infected update through a secure channel.
The best way to protect sources is to sign them, usually through PGP. As a maintainer, you add your PGP "stamp" on the package, and you give your public key to anybody wanting to check your packages. When a user downloads the package, he just have to check (with the public key), that the package was actually packaged by the maintainer. If an attacker changes anything in the source, the key check will fail.
But this won't protect you against what is named a "downgrade attack". The PGP signature only lets you check that the package you've download actually comes from the maintainer.
Now imagine you're using a specific piece of software in v1. You've downloaded the binary from the source, checked that it's really coming from the official source and install it. Two weeks later, a vuln is found in this version. A v2 is released that fixes the issue. You install it through your package manager, it downloads it from the source, checks that it really comes from the maintainer, and installs it. You're safe!
Or maybe not. It only checked that the file downloaded came from the maintainer. Not that it actually was the v2. Maybe you just reinstalled the exact same software. Maybe one of the servers that offered the download was compromised and offered v2 instead of v1 and you didn't see it because the PGP protection doesn't check the version, only that it comes from the correct maintainer.
An easy way to check would be to have a look at your current installed version. It's quite easy, but it is not the job of PGP, and it is apparently quite difficult for package manager to check that the correct version is installed.
Diogo then talked about the way to delegate power through a hierarchy of keys. Having one master key (stored offline), that could delegate power to other keys that would themselves delegate power. Keys on the top of the hierarchy would have longer expiration time while keys at the bottom will have short expiration time.
He also talked about a way to sign a package with multiple keys, to validate for example that a specific binary has correctly passed the staging and preprod tests. I must say I did not get exactly how all this part was working, but it is all included in the way Docker does manage its packages.
Really nice talk, I strongly encourage you to watch the video once it will be available.
Anne Canteaut
Next talk was not a very technical one, but more mathematical oriented. It was done by a researcher on cryptography. She pointed out that most of the security breaches described in the previous talks were never about weakness found in cryptographic parts.
Does that mean that the crypto part of security is perfect? Far from it. If you listen to crypto analysts, seems like a lot of algorithms are breached on a regular basis. But what does that mean anyway? And should we worry? Cryptographers are known to be paranoid, so should we really listen to them?
Spoiler: yes.
If cryptographers says that something is broken, you should stop using right now. Even if there is no practical way to exploit the weakness today, there will be in the future. Those breaches never get better, they always get worse.
There are two ways to consider a hash algorithm to be breached. The first one is to find two inputs that gives the same hash output. This is called a collision and md5 is breached in that way. This is "quite easy" to do, but does not really reduce the security level in any significant way. It opens the way to other exploits, but is not too dangerous in itself.
The second way is far more dangerous. It is when you are able to find a preimage, this means when given a specific hashed output, you can craft your own input that will result in this output. This is slightly different from the collision in one important way: the output is arbitrary and already given to you. It's like the birthday paradox: it is easier in any group of person to find two person with the same birthday, than in the same group finding one with the same birthday as you.
If you know how to craft a preimage, you will then be able to create an infected version of an valid input, that will give the same output as the valid one. If this happens, the algo is breached, and you should stop using it.
The number one rule of cryptography is to always trust public analysis. A lot of people can come up with new algorithm to hash data in a secure way. But it requires a large number of eyes looking at how it's done and brains trying to find breaches to actually prove that one is secure or not.
One of the most famous, AES, has been initially published in 1998 and is still used today, without any breach. The explanation of why it is so resilient is because he is the winner of a publicly held competition of algorithm that took place for 5 years. The best way to make your algo win in such a competition is to find breaches in the algo of the others. Doing so, you're sure to have the brightest minds on the subject trying to breach into it.
It was nice to have a glimpse of the mathematical side of this world. The subject is still very far from what I do, but interesting. I am completely unable to assess if a cryptographic algorithm is good or not, so I trust the experts on the subject. And it's nice to see that even expert can't easily assess every algo, and so that they have to trust each other on public analysis.
Paul Mockapetris
We then had a quick Q&A session with Paul Mockapetris, inventor of the DNS. From this exchange I'll only remember the following quote:
Hardware is like milk, you want it the freshest you can find. Software is like wine, you want it with a bit of age.
Web Platform Security
The last talk I could attend was by Mike West. There was another talk after this one, but I had a train to catch so couldn't stay up to the end. You can find the slides in here.
Mike started by telling us the story of Ulysses and the sirens. The sirens were beautiful female creatures with charming and dangerous voices. If men hear them, they would become crazy and jump overboard and drown. Ulysses learned from Circe that the only way to protect against the siren singing was to either put wax in your ears, or to tie yourself to the mast.
Same goes in web security. You have to tie yourself to the mast, meaning you have to follow a principle of least privilege. That way, even if an attacker manages to take control of your account, he won't be able to do much. It is better to do it that way than to try to block every possible way an attacker could compromise an account.
It is especially true for browsers. Browsers are becoming more and more powerful, accessing banking websites, connecting to bluetooth and camera. And being used by average users everyday. Mike, working in the Chrome dev team, told us about the beta features they are working on in Chrome.
He told us about the CSP, as we've seen, and its limitations. If you have different applications running on different subdomains, you cannot whitelist the main domain, because it will expose all subdomains to a breach in any of them. You could host a malicious script in one infected subdomain and make it load in another. They are working on something that would allow better granularity on the scheme, host and port allowed.
They are also adding another way to check for loaded script by adding a checksum verification on script. You would call your <script integrity="{hash}" src="..."> and the browser will check that the checksum of the loaded file matches the specified hash before executing it.
Conclusion
Really nice first occurrence of dotSecurity. Another one will take place next year and I will surely go again. I'll try to be an ambassador this time as well.
The panel of talks was large, from generic knowledge about the security world, to more technical one. I would say the mix was perfect.
Next year I'd like to see talks explaining in more details the selling and buying of vulnerabilities. Who discover them? How? Why? Who do they sell them to? How much? What do buyer do with it, etc.
This month's HumanTalks was hosted by Criteo. We had a hard time managing to correctly plug the laptops in to the projector and connect to the wifi so we started a bit late. This forced us to cut down the questions from 10mn to only 6 or 7.
Customer Solutions Engineer
The first talk was by Nicolas Baissas, my coworker at Algolia. He explained what the job of CSE really means. The job originated mostly in the Silicon Valley for SaaS companies, but is moving to all sort of companies.
The goal of a CSE is to make the customers successful, by making them happy to use the product, for the longest amount of time. This is especially important for SaaS companies where revenue is based on a monthly payment. You do not want your customers to leave your service, and the best way to keep them is to offer them a better and better service.
There will always be customers leaving, which is what is called a churn, but the goal of a CSE is to make sure the company has a negative churn. This means that the customers that stay compensate for those who left, because they now use more of the service than before. The CSE must ensure that less and less customers want to leave, by understanding what they are looking for while managing to bring more and more value of the product to the already happy customers.
The only way to do that is to try to be part on their team, showing that you're on their side, and not trying to push selling more and more. CSE are experts on their product, and they share that expertise with the customers by email, by Skype or whenever possible, by meeting them directly. This also happens before, during and after Algolia's deployment in their system. The CSE ensures that they use the service at its best, and that it is used in the best way possible to fit their needs.
Month after month, they have a regular look at all the past implementations of their customers and reach out with advices if they see things that could be improved. Today Algolia has more than 1000 customers but only 4 CSEs, so this has trouble scaling. So in parallel they also work on making it easy for all the customers they do not have time to talk to.
They write guides, explaining specific features in detail, with examples. They have already explained the same thing hundreds of times on calls, so they have experience on how to explain it clearly, then it's just a matter of writing it down. They also write detailed tech tutorials. They all have a tech background and know how to code, so they can really understand what it requires to implement Algolia in an existing system.
The goal is to automate most of the recurring work. They built a new tab in the UI to analyze the current configuration of a customer and suggest improvements. Those are the exact same improvements than they would have suggested during a one-to-one call, but because they have the experience and know how to code, they can just simply make it self-service for users directly.
Some of the features are too complex to be correctly grasped just by reading the theory, like geosearch. So they created a demo with real data, using all the best practices and letting users play with it to see how it was working. This worked really well and transformed a theoretical feature into a real life application and in turn generated signups to the service.
What Nicolas really stressed out is that the role of a CSE is to be as close to the customer as possible, in order to really understand, in a real-life scenario, what he wants to do, with the very specifics of its project. But as well be as close as possible to the product itself, part of the team that builds it so he can exactly know which features are ready and how they work. By doing both you can really bring your deep expertise of the service to the specific issues of the customer while helping build the service with real-life examples of real-life issues customers have.
A CSE's ultimate goal is to not be needed anymore, the documentation and self-service information should be enough for most of the users, and core developers of the service should be in direct contact with the users so they know how people really use their service.
JSweet
The second talk was about JSweet, a transpiler of Java to JavaScript. A transpiler is like a compiler, it transforms a language into another language, or even the same. There are Java-to-Java transpilers, that can refactor the code and there are already a lot of tools transpiling to JavaScript (e.g. CoffeeScript, Dart and TypeScript).
Of these three, TypeScript seems to be the most popular today. It was originally created by Microsoft, but then Google started using it for Angular. TypeScript mostly adds a typed layer on top of JavaScript, but still lets you use vanilla JavaScript wherever you want.
There already were attempts at a Java-to-JavaScript transpiler in the past, namely with GWT, but it was not as promising as announced and carried many organic limitations. GWT is too much of a blackbox, and you couldn't use the generated JavaScript with regular JS APIs, so that it was quickly outdated and the promise of having all Java in JavaScript wasn't even fulfilled. Mostly, it was done for developers that did not want to learn JavaScript and just wanted their backend code to work in the frontend as well.
Later on, we saw the emergence of NodeJS, where one of the cool features was that you could use the same language on the backend and frontend. NodeJS being written in JavaScript, it meant that you had to ditch your old Java/Ruby/PHP backend and put NodeJS instead. JSweet follows the same logic of "same language" everywhere, but this time it lets you use JavaScript with the Java language.
TypeScript syntax being really close to Java, it is easy to transpile from one to the other. Then, TypeScript transpiling to JavaScript, you can transpile all the way from Java to JavaScript.
This lets you use all your JavaScript libraries (like underscore, moment, etc) directly in your Java code. And you can also write your frontend code with Java, letting you follow the paradigm of "one language everywhere". Internally your Java code will be transpiled to TypeScript, then to JavaScript. Not all of Java will be available in JSweet, though.
I never coded in Java so I am unsure how useful this really is, but it seemed like a nice project if you want to keep the same language from back to front.
Shadow IT
Next talk was about what is called the Shadow IT, by Lucas Girardin. The shadow IT encompass all the IT devices in a company that are invisible to the radar of the official IT departement. It includes all the employees cell phones that are used to check personal emails during the day, all the quantified self devices (FitBit, etc), Excel files filled with custom macros, personal dropbox account and even the external contracts with freelancers that are not approved by the IT department.
Granted, this kind of issues only occur in big companies, where there are way too many layers between the real needs of employees and the top hierarchy trying to "rationalize" things. This talk gave a great echo to the first talk about CSE and reminded me why I quit consulting ;).
Anyway, the talk was still interesting. It started by explaining why these kinds of shadow developments appeared in the first place: mainly because the tools the employees have were not powerful enough to let them do their job in the best environment. And because employees are getting more and more tech-savvy, they found their own ways to bypass the restrictions. They expect to have the same level of features in their day job than at home or on their smartphone. If their company cannot provide it, they will find other ways.
Unfortunately, these ways are frowned upon by the IT departement. Maybe the Excel sheet the employee is creating is really useful, but maybe it is also illegal in regard to personal information storage. Or it will break as soon as the Excel version is changed, or be completely lost when the employee leave and no backup exists.
Then I started getting lost in the talk. Some of the concepts he talked about were too alien to what I experience everyday that I had trouble understanding what it was really about. In the end he suggested a way to still rationalize those various independant parts, by building a Platform that lets users build their own tools, even if they do not know how to code. This platform would get its data from a Referential that is accessible company-wide and hold the only real trustable source of data. And finally, the IT departement will build Middlewares that will help application A to communicate with application B.
In the end, the IT department will stop building custom applications for their employees, but simply provide the tools to help them build it themselves. Still, it will have to create the middlewares to let all those parts discuss.
I cannot help but think that this does not fix the initial issue but simply gives the IT department the feeling that it is in control again. As soon as the platform tools will be too limited for employees to really do what they want (this will happen really quickly), they will revert to using other, more powerful, tools and this will still be out of the IT departement reach. I fail to see how this is any different from what it was before, except that instead of building the application themselves, the IT departement now builds the tools so the employees can build the applications, but they are still needed to make them work together and will still be a bottleneck.
The last talk was presented by Michel Nguyen and was about the Criteo Hero Team.
Michel told us about his team, the Hero team, for Escalation, Release and Ops. He added a H in front of it to make it cooler. The team is in charge of all prod releases as well as dealing with incidents.
They realized that whenever they put something in production, something else breaks. So they started coordinating the releases and having all the people that could understand the infra in the same team, to better understand where this could break.
The now use a 24/7 "follow the sun" schedule where teams in France and the US are always awake to follow the potential issues. They have an escalation system with two layers, that lets them deal with the minor issues without creating a bottleneck for the major ones. The Hero team is in charge of finding the root causes of the issues and if none can be found quickly enough, they will just find a temporary workaround and dig deeper later. Once the issue is found and fixed, they do a postmortem to share with everybody what went wrong, how they fixed it, and ways to prevent it from happening again.
They use Centreon and Nagios as part of their monitoring and check after each production release the state of the metrics they are following, to see if nothing extremely abnormal appeared. If too many metrics changed too widely, then we can assume something is not working correctly.
The current production environment of Criteo is about 15.000 servers, which weigh as much as 6 Airbuses and would be twice the height of the Empire State Building. They handle about ~1200 incidents per year and resolve about 90% of them in escalation. The last 10% are incidents that depends from third-parties, or one-shot incidents they never understood.
To be honest, even if the talk was interesting (and Michel is a very good speaker), it felt too much like a vanity metrics contest. I know Michel had to cut his talk from 30mn to 10mn in order to fit in the HumanTalks format, so I'd like to see the full version of it.
Conclusion
I did not felt like I was the target of the talks this time. I already knew everything about the CSE job because I work with them everyday, I never coded in Java, stopped working in companies big enough to have a Shadow IT issue and as I said, the last one left me hungry for more.
Still, nice talks and nice chats afterwards. Except for the small hiccup with the projector and wifi at the start, the room was perfect and very comfortable and the pizza + sushi buffet was great.
Next month we'll be at Viadeo. Hope to see you there!