If you have the Content, then Apache has the Technology! A whistle-stop tour of the Apache content related projects Nick Burch Software Engineer Alfresco Apache Projects • 79 Top Level Projects • 40 Incubating Projects • 30 “Content Related” Main Projects • 7 “Content Related” Incubating Projects 37 Projects in 50 minutes With time for questions... This is not a comprehensive guide! Different Technologies • • • • • • Serving Storing Transforming Generating Hosting Web Framework Rendering / Templating / etc What can we get in 50 mins? • A quick overview of each project • When talks on the project are happening • When meetups on the project are happening • Anything new/exciting about the project? • What interests me in the project! Serving up your Content Apache HTTPD Server • • • • • • • http://httpd.apache.org/ Talks – All day Wednesday Meetup – Thursday evening Very wide range of features (Fairly) easy to extend Can host most programming languages Can front most content systems Can proxy your content applications Can host code and content Apache TrafficServer • • • • • • http://trafficserver.apache.org/ High performance web proxy Forward and reverse proxy Ideally suited to sitting between your content application and the internet For proxy-only use cases, will probably be better than httpd Fewer other features though Often used as a cloud-edge http router Apache Tomcat • • • • http://tomcat.apache.org/ Talks – All day Friday! Java based, as many of the Apache Content Technologies are Java Servlet Container And you probably all know the rest! Tomcat – What's New • • • • • http://tomcat.apache.org/ Memory leak detection – for your applications, and for the JVM! Easier to embed – no need for large numbers of config files! Asynchronous request processing for things like Comet / Bayeux Servlet 3.0 Improved JMX configurability Storing all that Content Apache Cassandra • • • • • • http://cassandra.apache.org/ Talk - 11am Wednesday Meetup - Wednesday evening One of our many NoSQL Databases Column-Family store Eventually consistent Distributed, replicating, no SPF Can elastically add machines Apache CouchDB • • • • • • • http://couchdb.apache.org/ 12pm Wednesday Relax! Erlang NoSQL Document orientated distributed store Eventually consistent if replicating Map-Reduce queries Apache HBase • • • • • • • http://hbase.apache.org/ 2pm Wednesday Recently graduated from Hadoop Another NoSQL Database Column-Family store, modelled on Google's Big Table paper Some transactions and locking Fast range queries and sorting Built on HDFS Which Apache NoSQL? • Do you have tuples, documents, variable key/values or complex object? • Must data always be consistent? • If you loose a chunk of machines (partition), should read/write still work? • Query by id, range, arbitrary key/value or map-reduce function? • How much human interaction is required to add or remove nodes? Apache DB: Derby • • • • • • http://db.apache.org/derby/ Small, easy to embed SQL database Can be embedded and accessed via an embedded JDBC driver Can be accessed over the network Can be run entirely in-memory Efficient on-disk format Has a JavaME version – run it on basic cell phones! Apache Directory • • • • • • http://directory.apache.org/ LDAP Directory Optimised for many reads per write Hierarchical, class/attribute based storage Triggers, stored procedures, queries and views Multi-master replication Rich permissions model built in Apache JackRabbit • • • • • • • http://jackrabbit.apache.org/ 1.30pm Thursday JCR (Java Content Repository) Hierarchical content store Supports structured and unstructured data Transactional Support versions Full text search built in Apache Lucene • • • • • • http://lucene.apache.org/ All day Friday + Meetup Tuesday night Inverted index store (Each term lists it documents, rather than each document listing terms) Searching is faster than adding Normally stores text, but additional data can be associated with it Can hold indexed and un-indexed data Lucene – What's New? • • • • • http://lucene.apache.org/ Lucene and SOLR have merged Near real-time support when indexing Better storing of attributes and other data in the token stream Numeric fields improved – no need to externally process numbers into range buckets yourself Fast vector highlighter for large docs Apache Subversion • • • • • • http://subversion.apache.org/ Meetup Thursday evening Versioning content store Efficient at storing changes Normally stores code, text and the odd binary blob If you have textual data and you want a versioning store, it's a good fit! Used by the new Apache CMS Apache Xindice • • • • • http://xml.apache.org/xindice/ Native XML Database No need to map your complex XML files to a different data structure Ideally suited to problems where you have large numbers of XML files, and little / no other content Schema independent model XPath queries Transforming and Reading Content Apache PDFBox • • • • • • http://pdfbox.apache.org/ 4pm Wednesday Read, Write, Create and Edit PDFs Create PDFs from text Fill in PDF forms Extract text and formatting (Lucene, Tika etc) Edit existing files, add images, add text etc Apache POI • • • • • • • http://poi.apache.org/ 3pm Wednesday + FastFeatherTrack File format reader and writer for Microsoft office file formats Support binary & ooxml formats Strong read edit write for .xls & .xlsx Read and basic edit for .doc & .docx Read and basic edit for .ppt & .pptx Read for Visio, Publisher, Outlook Apache Tika • • • • • http://tika.apache.org/ 9am Friday + Fast Feather Track Java (+ command line) toolkit for detecting and extracting content Identifies what a blob of content is Gives you consistent metadata back for it Parses the contents into plain text, HTML, XHTML or sax events Tika – What's New? • • • • http://tika.apache.org/ Lots of new parsers – text, office formats, publishing formats, images, audio, CAD, fonts etc Long standing parsers improved – better HTML from word for example Embedded resources and containers Use expanding – used by many SOLR users, Alfresco, lots of people crunching masses of data on Hadoop Apache Cocoon • • • • • • http://cocoon.apache.org/ Component Pipeline framework Plug together “Lego-Like” generators, transformers and serialisers Generate your content once in your application, serve to different formats Read in formats, translate and publish Can power your own “Yahoo Pipes” Modular, powerful and easy Apache Xalan • • • • • • • http://xalan.apache.org/ XSLT processor XPath engine Java and C++ flavours Cross platform Library and command line executables Transform your XML Fast and reliable XSLT transformation engine Apache XML Graphics: Batik • • • • • • http://xmlgraphics.apache.org/#batik Java SVG toolkit + library SVG Parser – read and process existing SVG files SVG Generator – Graphics2D implementation that outputs SVG SVG Dom – easy way to manipulate your SVG files SVG viewer program (Squiggle) Command line SVG rasteriser Apache XML Graphics: FOP • • • • http://xmlgraphics.apache.org/#fop XSL-FO processor in Java Reads W3C XSL-FO, applies the formatting rules to your XML document, and renders it Output to Text, PS, PDF, SVG, RTF, Java Graphics2D etc Lets you leave your XML clean, and define semantically meaningful rich rendering rules for it Apache Commons: Codec • • • • http://commons.apache.org/codec/ Commons Track – Thursday Morning Encode and decode a variety of encoding formats Base64, Hex, Phonetic and URLs Handy when interchanging content with external systems Apache Commons: Compress http://commons.apache.org/compress/ • Commons Track – Thursday Morning • Standard way to deal with archive formats • Read and write support • zip, tar, gzip, bzip, cpio and ar • Wider range of capabilities than java.util.Zip • Common API across all formats Apache Commons: Sanselan • • • • • • • http://commons.apache.org/sanselan/ Commons Track – Thursday Morning Pure Java image reader and writer Fast parsing of image metadata and information (size, color space, icc etc) Much easier to use than ImageIO Slower though, as pure Java Wider range of formats supported PNG, GIF, TIFF, JPEG + Exif, BMP, ICO, PNM, PPM, PSD, XMP Generating Content Apache Forrest • • • • http://forrest.apache.org/ Document rendering solution build on top of cocoon Reads in content in a variety of formats (xml, wiki etc), applies the appropriate formatting rules, then outputs to different formats Heavily used for documentation and websites eg read in a file, format as changelog and readme, output as html + pdf Apache Abdera • • • • • • http://abdera.apache.org/ Atom – syndication and publishing High performance Java implementation of RFC 4287 + 5023 Generate Atom feeds from Java or by converting Parse and process Atom feeds Atompub server and clients Supports Atom extensions like GeoRSS, MediaRSS & OpenSearch Apache Droids (Incubating) • • • • • • http://incubator.apache.org/droids/ Intelligent Robots! Generic standalone crawler framework Easy to extending existing common crawlers Easy to write custom ones Queue requests for content, protocol handler gets it, multi threaded Uses Apache Tika for core of handling fetched resources Apache JSPWiki (Incubating) • • • • • • • http://incubator.apache.org/jspwiki/ Feature-rich extensible wiki Written in Java (Servlets + JSP) Fairly easy to extend Can be used as a wiki out of the box Provides a good platform for new wiki based application Rich wiki markup and syntax Attachments, security, templates etc Apache ManifoldCF (Incubating) http://incubator.apache.org/connectors/ • Name has changed a few times... (Lucene/Apache Connectors) • Provides a standard way to get content out of other systems, ready for sending to Lucene etc • Different goals to CMIS (Chemistry) • Uses many parsers and libraries to talk to the different repositories / systems • Analogous to Tika but for repos Apache PhotArk (Incubating) • • • • • • http://incubator.apache.org/photark/ 5pm Thursday Open Source Photo Gallery application Standalone or servlet modes Can host photos locally Can aggregate external photo albums (Flickr, Picassa) for a unified view SCA programming model – uses Apache Tuscany to power it Hosting Content Apache Chemistry (Incubating) • • • • • • • http://incubator.apache.org/chemistry/ 2pm Wednesday Java, Python and PHP, Atom and WS* OASIS CMIS (Content Management Interoperability Services) Client and Server bindings “SQL for Content” Consistent view on content across different repositories Read / Write / Manipulate content Chemistry vs ManifoldCF • • • • • • incubator /chemistry/ /connectors/ ManifoldCF treats repo as nasty black box, and handles talking to the parsers Chemistry talks / exposes repo's contents through CMIS ManifoldCF supports a wider range of repositories Chemistry supports read and write Chemistry delivers a richer model ManifoldCF great for getting text out Apache Lenya • • • • • • • http://lenya.apache.org/ 9am Thursday XML Content Management system Powered by Apache Cocoon WSIWYG editors onto Relax-NG XML Rich workflow engine + staging Clean URLs, CSS for styling Sensible handling of metadata, assets, internal links, users, permissions etc Apache Roller • • • • • • • http://roller.apache.org/ Multi-user blog server Used by the ASF internally Scales to thousands of users & blogs Should work with any JavaEE servlet container and SQL database Comment moderation and spam filters Each author has full layout control Indexes, feeds and Metaweblog API support for 3rd party clients Apache Shindig • • • • • • http://shindig.apache.org/ Open Social Application Container Hosts your open social widgets Renders OpenSocial applications into HTML + JavaScript Stores the data for your application Full client-side JavaScript libraries to deliver gadget functionality Reference implementation Apache Wookie (Incubating) • • • • • http://incubator.apache.org/wookie/ 5.30pm Wednesday W3C Widgets server Upload, Deploy and Host Widgets Widgets can range from a badge, through a small app to a full-blown collaborative system like chat Connector framework to make it easy to write widgets in many languages Web Frameworks (those with a strong Content focus to them) Apache Sling • • • • • • • http://sling.apache.org/ 12pm Wednesday “Fun” and easy web framework REST based Backed by Jackrabbit content repo Powered by OSGi Easy to script, supports multiple output languages (JSP, server side javascript, scala etc) Stores both templates and content Apache Tapestry • • • • • • http://tapestry.apache.org/ Object Orientated web applications Build your application in terms of objects, methods and properties Tapestry handles URLs, query parameters and state for you Pages built with simple HTML Concentrate on the content that backs each part, and the business logic for it Tapestry glues it together for you Apache Tiles • • • • • http://tiles.apache.org/ Templating framework for Java Works well with Struts and Shale Lets you build your page from lots of tiles (components), which can nest Build tiles together to make templates Clean separation between your content, the business logic to select it, and the rendering rules Apache Velocity • • • • • http://velocity.apache.org/ Templating engine MVC webapp or standalone Can generate HTML, SQL, PostScript, XML, Java Code or email from templates Anakia lets you make a xdoc file available to a velocity template, handy when generating HTML from xdoc Fairly rich templating language Apache Wicket • • • • • http://wicket.apache.org/ Build your web applications in Java Uses Java in preference to JavaScript, CSS etc Handy if you have a strong Java team and you need to do some web stuff Fits well with your Java components But JS / CSS front end devs tend to be cheaper than Java ones.... Apache Clerezza (Incubating) • • • • • http://incubator.apache.org/clerezza/ OSGi based modular semantic web application framework Lets you build applications that fit into the Semantic Web Stores and easily manipulates RDF Full control over REST and URIs Build applications that both consume semantic data (eg RDF files), and that expose content to others Any Questions? Any cool projects that I happened to miss?
© Copyright 2024