Interview: Kostas Tsioutsiouliklis, Director of Engineering and Content Science at Yahoo

kostas_tsioutsiouliklis
I recently caught up with Kostas Tsioutsiouliklis, Director of Engineering and Content Science at Yahoo, to learn more about Omid, its transaction manager for HBase that has been accepted as an Apache Incubator project. Kostas’ team focuses on Content Acquisition and Content Understanding for web search and for the Yahoo homepage. He holds a Ph.D. in Computer Science from Princeton University, and a Diploma in Computer Engineering from the University of Patras, Greece. Prior to Yahoo, Kostas was a member of the Search and Relevance team at Twitter, working on trends.

insideBIGDATA: Can you describe the genesis of the Omid project at Yahoo?

Kostas Tsioutsiouliklis: Omid began as a research project at Yahoo back in 2011. The paper on Percolator was presented the year before in the OSDI conference, drawing attention to providing transactions over big-datastores. Until that moment, transaction managers were not included in most of the NoSQL datastores because: 1) ACID transactions were hard to scale to massive data; 2) many applications that were built on top of those datastores traded-off consistency for scalability, avoiding the need for ACID transactions.

However, Percolator opened the gates for building a whole new set of web-scale applications with strong consistency requirements. The goal of the first incarnation of Omid was to build a high-performant, lock-free transaction manager on top of HBase, suitable for managing the volume of transactions expected in web-scale/big-data applications with ACID requirements.

That first version of Omid was open-sourced in Yahoo’s Github and the open-source community started expressing interest in the project. In the meantime, internally at Yahoo, Omid went through several iterations and upgrades, as we were planning to use it in production for our next-generation content ingestion platform at web scale. Along the way, we identified some drawbacks in the original design and decided to re-architect it so it could meet our scalability requirements. We released the new implementation in Yahoo’s Github last year. The release was received with additional interest from the open-source community, and in the end, we decided to move the project under Apache.

insideBIGDATA: How important was it that Omid, a transaction manager for HBase, has been accepted as an Apache Incubator project?

Kostas Tsioutsiouliklis: Very! We thought that moving to the Apache Software Foundation (ASF) was the logical next step for the Omid project after we open-sourced the code in Github and heard feedback from the open-source community. Yahoo has a long-standing history of contributing to Apache projects. As developers and contributors, we understand the implications of making Omid an Apache project. It is a great indicator of the larger community’s interest in the technology and how they deem it relevant to what they are working on. We are confident that Omid can benefit from the Apache environment, ecosystem, and infrastrastructure.

We submitted the proposal to the ASF back in March. Our aim was to transfer Omid to their Incubator program and start building a stable open-source community around it in the Apache Way. Also, we wanted to grow the user base to make an even more robust and flexible solution for providing transaction to NoSQL datastores. Of course, having been accepted as Incubator is just the first step in the Apache Way, and for the near future, we would like to see Omid graduating as a top-level project.

insideBIGDATA: The move follows the successes of ZooKeeper and BookKeeper, which were born at Yahoo and graduated to top-level Apache projects. It seems that this marks another significant milestone for the company within the Apache community in the Hadoop ecosystem, yes?

Kostas Tsioutsiouliklis: Definitely. Yahoo has been committed to the open-source community for many years, and we have seen massive success working with the Apache community, especially in the area of big data technology. By publishing our code, we can refine the technology with the help of others, and enlist their help to maintain it over time. When a project gets accepted by Apache it means that many companies have agreed that it is technology worth having for the industry. It is also a confirmation that our engineers are working on cutting edge technology.  Also, we benefit by being able to collaborate with others who are equally passionate about the problems we care about. Omid is another milestone indicating that Yahoo contributes to, and also benefits from, the work we do together as an industry. That is the ultimate value of open source.

insideBIGDATA: Rumor has it that Hortonworks had adopted Omid to store Hive metadata. Can you confirm?

Kostas Tsioutsiouliklis: That is correct. At the end of 2015, the Apache Hive community expressed interest in storing Hive metadata in HBase using Omid for ensuring data consistency. We started talking about it with Alan Gates, Daniel Dai, and the team at Hortonworks, and that resulted in a fruitful collaboration that allowed Hive to integrate Omid for their metastore needs, and has also brought to Omid -among other things- support for HBase 1.x versions and a configurable conflict resolution component that reduces the hardware utilization and eases the deployment in environments with low CPU resources.

insideBIGDATA: What’s next for Omid a year out from now?

Kostas Tsioutsiouliklis: We will add new and exciting features in the next release of Omid, including high-availability support, as well as performance improvements. So stay tuned on our Apache Website and @ApacheOmid on Twitter!

In its current release, Omid is stable and mature enough to evince a lot of interest. We have been approached by other Apache projects, like Phoenix and Distributed Log, and they are interested in integrating Omid. Our top priority for next year is  to build a strong community of Omid users and contributors to move the project forward.

Sign up for the free insideBIGDATA newsletter.