Machine Learning On Dirty Data www.dataiku.com Dataiku in short Software editor behind Data Science Studio, the « Photoshop for Data Science » Our objective: to make data science accessible to all types of profiles www.dataiku.com Our clients • They build applications with their data: – – – – – – • Predicting parking spot availability Analysis of web activity and behaviour segmentation Customer churn anticipation and marketing activation Maintenance prevention and material breakdown impact reduction Fraud detection … They shorten their innovation cycles: – – – DSS diminishes their entry barriers and gives way to easy reconversion of internal teams Standardisation of practices and reduction of the number of tools necessary Easy collaboration between data analysts, business experts, and IT engineers on one platform www.dataiku.com Turn Device Logs Into Next Years' Business by Parking ticket machine data OpenStreetMap data Data Science Studio Cleaning and enrichment of data Crossing data Each street is segmented into small pieces that are enriched with geospatial information. The parking ticket history is joined with the points of interest from OpenStreetMap. www.dataiku.com Creation of a predictive algorithm The availability of parking lots is predicted by street segments from the joined data. Availability of the predictions The algorithm is finally integrated in the iPhone app « Find me a space ». Predictive Monitoring for Search Engine Relevance by Users searches Web logs Words within the requests are analysed by the studio. Web logs with clicks and bounce rates are imported in the studio. Data cleaning and enrichment Customized algorithm The Data Science team of PagesJaunes identifies unsuccessful searches and train a customized algorithm. Web logs are enriched (time spent on the website per user, localisation, etc.) Algorithm used with all data Long-‐term monitoring of unsuccessful searches Dataiku’s technology enabled us to rationalise our work thanks to machine learning on millions of searches. The process is optimized, we know what and how to do it. Erwan Pigneul, www.dataiku.com Project Manager PagesJaunes Optimizing Last Mile with Data Science Studio by Data Science Studio Historical delivery and retrieval data Cleaning and temporal enrichment of data Data aggregation by geographic location Incorporation of new deliveries to the existing model www.dataiku.com Modeling of a score for each delivery Predictive Model To Optimize Restaurant Pages by Restaurant data (place, type…) Data Science Studio Cleaning and Enriching Centralizing the data Analysis and modeling User feedback (comments, length…) Scoring of a restaurant’s page parameters in terms of customer satisfaction Traffic logs (visits, clicks, time…) www.dataiku.com Increase website traffic by optimizing the correct parameters Create value with data driven applications DATA IN Parking ticket machine data OpenStreetMap data ENRICH / COMBINE / COMPUTE VALUE OUT Data Science Studio Cleaning and enrichment of data Crossing data Each street is segmented into small pieces that are enriched with geospatial information. The parking ticket history is joined with the points of interest from OpenStreetMap. www.dataiku.com Creation of a predictive algorithm The availability of parking lots is predicted by street segments from the joined data. Availability of the predictions The algorithm is finally integrated in the iPhone app « Find me a space ». Churn Segmentation Recommender Lifetime Value A MODEL An automated way to make a computer take a decision from raw (historical) data Volume Forecast Score Location The model can be used Risk to take immediateHot (real-time) actions through an API Pricing Ranking www.dataiku.com Event Paths Fraud 2015 : BUILD YOUR FACTORY Multiple Data Sources Many Models Personalised Experience Model Acquisition Cost Opportunity Model CRM Stock Optimisation Model Logs Analyst Team Server Cluster Light Software www.dataiku.com Optimize Delivery But … “Data Science “I spend too much time Superstars are cleaning up my data really hard to hire.” with inappropriate tools.” “Our models are quite difficult to set up so they are rarely deployed into production.” www.dataiku.com “There is too much plumbing involved in making all these Big Data technologies work together and then in successfully deploying applications with them.” Data Science Studio A studio for all your data driven applications Load and prepare your data Analyse and build your models Publish and run your projects For all profiles Collaborative Open and controlled www.dataiku.com Data preparation • Connect to all your data sources • Explore them visually • Transform and enrich them interactively • Save your ‘recipes’ and reuse them later www.dataiku.com Analyse and model • Discover correlations and significant variables • Easily build your first models in a visual interface • Test and improve several models alongside one another • Deploy the models’ results directly inside your infrastructure www.dataiku.com Deploy into production • Go quickly from prototypes to large scale production • Manage data inputs and outputs from the interface • Export and publish your results in several forms • Control the updates with options such as scheduling, partitions, and replications… www.dataiku.com Collaborative work • Enjoy a web interface and a shared platform • Organise your work by projects and by teams • Reuse the team’s work at any time • Make sure everyone is always on the same page: share insights, graphics, comments, etc. with your team www.dataiku.com Open and controlled • Take advantage of open source technologies such as Hadoop, iPython, scikit-‐learn, R… • Integrate your own libraries and scripts • Keep the data safe in your own infrastructure • Keep your innovations under control: algorithms and predictions belong to you www.dataiku.com “We could probably better understand ours users. But how ? “My data is too dirty. I don’t even know where to start ” “There’s a trend here, but our full historical data is just too big” You have data You have ideas You need a tool http://www.dataiku.com/dss/trynow/ Dataiku West 2423A Durant Avenue Florian Dataiku HQ [email protected] 2 rue Jean Lantier 75001 Paris France Berkeley, CA 94704 www.dataiku.com ANNEXES www.dataiku.com A predictive application? + Data = Algorithms Predictions Knowledge Iterations Calculation Industrialization Deployment (Machine Learning) Requirements: Collection Preparation Crossing www.dataiku.com
© Copyright 2024