BDU 2.0 will focus on creating a Big Data culture of success and to explore topics such as "do things differently than you or your organization have ever done before".
Travel industry is an amazing field for Data Science and Data Scientists at Booking.com. In this talk you will find all about our success stories, failures, challenges, anti-patterns, good practices and everything we learnt during all these years turning petabytes of data into awesome trips.
In the previous years we have got the Polyglot Persistence. This is a fancy term which means that when storing data, it is best to use multiple data storage technologies, chosen based upon the way data is being used by. If we have multiple persistence, then sometimes we need polyglot operations. One of the most popular use case in Big Data is searching. Almost all websites provide a search function to their users, to be able to find what they are looking for. Usually it is an Apache Lucene based solution, like Elasticsearch or Solr. I will show you how to enrich this kind of searching with the power of graph based searches, and implement a polyglot search functionality, where the results are based on the cooperation of a search engine and a graph based real time recommendation.
Open data is an intellectual treasure trove that already helped many unexpected and often fruitful applications to surface. There are many areas where open data is considered to be a high value, and where examples of how it has been used already exist. This session will show how easy it is to work/play with open data through a BI tool. The presentation will show you how to consolidate data from different sources into a single analysis, which allows you to see the connections and answer questions. Also, I will give you hints and tips where to find accessible open data sources and use cases.
This talk is for the underdog. If you're trying to solve data related problems with no or limited resources, whether it's time, money or skills, don't go any further.This talk points mostly to decades-old technology, free operating systems and cheap hardware if possible, but if it makes sense to spend a hundred bucks instead of tearing your hair, we'll say so. This talk is opinionated. The cloud is somebody else's computer, use it if it makes economical sense and you believe that distributed computing is a solved problem. The stream consists of lots of unborn events that has not been acknowledged, don't cry if you lose them. Every abstraction layer could introduce a magnitude of slowing memory, processor and I/O, black boxes and undebuggable errors. Unfortunately they actually do so. Nobody ever got fired because of grepping files from drives mounted to memory drives (aka MEMDISK). We mostly use bash, SQL and make. Maybe Python and Go if we really have to. This talk does not contain made up sample code and false promises of fancy technology. I talk about stuff we use in production. Period. It's gonna be fine. Nothing from Apache, no Mapreduce, no streaming. Long live James Mickens.
Two of the major expenses for tech companies are development time and computing resources. Andrei Alexandrescu, a famous C++ programmer said that, once he optimized 1% on the Facebook backend, he saved more than 10 years of his salary on electricity costs for the company each month. In order to save costs, we need high level tools to make developers productive and we need those tools to perform well. The key to those tools is compiler technology. In this talk I will introduce how some ot the tools (Apache Flink, Apache Spark, TensorFlow, etc.) can make your life easier by providing faster and more convenient runtime code generation.
Huge volume of data is generated by the telecom systems day by day; considering data privacy rules, we make it available for data-driven service developers.
We are representing a startup called DiabTrend, whose members are eager to solve real life healthcare problems and convinced that data science has the best tools for it. Our focus is on diabetes as it is one of the most common diseases of our time. According to WHO, 422 million people were affected by it in 2014. There is still no cure for this disease while it is really hard for a lot of people to treat it. We’ll talk about how to use neural networks in the field of medical science and possible use cases. We want to present our solution through diabetes to show the performance of recurrent neural networks in predictive analysis on complex real world problems especially in healthcare. We speak about why it is important to work on these technologies today. How can we cope with big companies and why do we think the future is in machine learning based projects?
How to write a data pipeline from scratch using high performance components that scale better, both in technical and financial sense than Hadoop. “Reactively” is a marketing and product analytics platform for online businesses. From data collection to visualization it covers all aspects of data-driven marketing and product development.
Creating large scale web crawling networks comes with numerous problems that range from graph theory to memory and network optimization. At SentiOne we managed to create a system which monitors and extracts content from 500.000 domains in 23 European languages. In my presentation I explain the key challenges in web scraping, the way we overcame them, and will also speak about the failures we made and the solutions that didn't work for us.
You can use logging on your IoT device(s) for collecting usage statistics, monitoring, security or for debugging running applications. Logs can then be sent to various Big Data destinations for storage or for further analysis. Learn more about how you can solve both the logging agent and central server side using syslog-ng through a wide range of examples that include Amazon Kindle, BMW i3, and industrial devices of National Instruments.
The chatbot hype is over. Brands need real solutions for real conversations. This talk is about how machine learning can help human chat agents to be more productive, and how real conversations can help chatbots to get off the ground.
As a part of MOL Group Machine Learning Program, this project’s aim is to develop automated advanced business analytics abilities in Danube Refinery. After a successful PoC (Proof of Concept) in 2016 regarding Delay Coker Unit Coke yield and steam eruption forecast the next steps are to:
3V’s of Big Data are like a quickly moving target. To shoot effectively, you need to plan for the unexpected. I will share best practices about how our customers are using Qlik Sense in Big Data / Data Lake infrastructures. Best practices aim to give high adoption of analytics among wide variety of users of different skills (not only data scientists). Thanks to the API’s of Qlik Platform, we can manage analytics and it’s metadata in a more automatic way. The extendibility of Qlik Platform, enables custom data visualization. Thanks to the wide connectivity possibilities we can talk i.e. to NoSQL data sources, Web services, REST API’s, R, Python and much more.
We were new to Hadoop by the time we started building Prefixbox. In the last three years we have gone through a lot of iterations to take our product this far. In this talk I will share key insights about our learnings: what did and did not go so well.
Many companies are experimenting today with Big Data use cases. They set up Data Lakes to collect and manage data from a set of sources that fit to the subject area, and begin to analyze the contents with the carefully chosen - or just somehow already known - tools from the available wide choice of different Hadoop distributions, or elsewhere. Fine so far. But, how can the analysts - who have a vast knowledge about how business is running its course in a company - be involved into the exploration of the value?
In the 1930s one farmer produced enough agricultural product to feed 4 people. In the 1970s this number rose to 73. Fast forward in the 2010s one farmer produces enough food to feed 155 people. There are complex trials, breeding programs and huge amounts of analysed data behind these improved capacities and great perfomances. In 2017 we are in the modern technolgy’s era: data can be collected easily, although the extent of the continuously collected data becomes hardly manageable. Effective data analysis is the key to maximize the available production areas’ potential and keep feeding the word’s quickly increasing population. The agricultural sector needs to find the best solution how to analyse, systematize and interpret the outcome of the available data to implement further long-term solutions for field production. Our challenge: ensure real-time data collection processes and provide visually easily manageable outcome based on measured data on field.
While technologies change rapidly project failures come the same as before. If we take a look at some Big Data project failures (not much seen publicly) we might find more reasons on the organization’s side than in the technology. One pattern of project failures seems to be related to the expectations to lower cost of data and analytics based on open source technologies.
Data Science at Booking.com
Data Janitor 101
Big Bang Theory and Telekom
From IoT to Big Data using syslog-ng
Chat automation doesn't start with chatbots
Let’s get practical. Qlik Sense design patterns for Big Data / Data Lake
Data Science to Fight Diabetes, Predictive Analysis
Project failure patterns when shooting for cheap data storage
The power of polyglot searching
Working with open data
Improving Operational Efficiency and Asset Health with Predictive Analytics in Downstream Process
Beyond Hadoop, a simple data pipeline
Big Bang Theory and Telekom
Compiler Technology: Key to the Performance
Data management in the field production sector: challenges, vision and effectiveness
Lessons learned while building Search Analytics pipeline using Hadoop on Azure
Hooray! We have a Data Lake! What can our Analysts do with it?
Workshops for exploring opportunities in the Era of Big Data
Deep dive into machine learning tools and appliances
Detailed relevation of distributed technologies
Hottest NoSQL approaches and tools
Best practices for using graph databases such as Neo4j
... data lovers, technical experts, Senior and C-level executives from leading innovators
in the Data Science space. Executives from startups to large corporations will attend our conference.
The conference will feature internationally recognized speakers and it may be the most powerful event you attend in 2017.
The conference is FREE OF CHARGE but attendees must register via Eventbrite.
However VIP tickets are also sold (35.000 HUF) giving you the excellent opportunity to spoil yourself with quality gourmet food, PaaS (Palinka as a Service) and leverage the possibility of excellent networking.
Why to sponsor the event? Well, Big Data Universe gives you as many opportunities as many stars are
shining on the night sky.
We have developed convenient and customizable packages to help your organization meet its objectives and reach your target market in the Big Data industry.
We are dedicated to making you part of a truly great conference experience. For detailed information please download our sponsorship offer!
Over 15 leading experts in Data Science area present at our conference regularly.
Please send an email to firstname.lastname@example.org for speaking engagements.
We are keen to any ideas you may have to make BDU Conference 2.0 even more special! Send your ideas to email@example.com.
By entering the event premises, you consent to interview(s), photography, video recording and its/their
release, publication, exhibition, or reproduction to be used for news, media, or any other purpose by
Big Data and its affiliates and representatives. You release Big Data Universe Conference 2.0, its
partners and each and all persons involved from any liability connected with the taking, recording,
digitizing, or publication of interviews, photographs, computer images, video and/or sound recordings.
By entering the event premises, you waive all rights you may have to any claims for payment or royalties in connection with any exhibition, streaming, webcasting, television, or other publication irrespective of whether a fee for admission or sponsorship is charged.
You have been fully informed of your consent, waiver of liability, and release before entering the event. If you would not like to be recorded, please notify our staff members upon registration.