Cloud Programming on Java EE Platforms mgr inż. Piotr Nowak ❖ Two main versions: ❖ Open Source ❖ Enterprise ❖ http://www.gridgain.com ❖ http://www.gridgain.com/products/in-memory-datafabric/editions/ 2 Install ❖ Download, unpack, use ❖ ❖ www.gridgain.com/products/in-memory-datafabric/editions/ export GRIDGAIN_HOME=<GridGain_install_dir> 3 Setup ❖ configure client project with Maven <dependency> <groupId>org.gridgain</groupId> <artifactId>gridgain-fabric</artifactId> <version>${version}</version> <type>pom</type> </dependency> ❖ start cluster with ❖ $ bin/ggstart.sh <optional configuration> ❖ Maven project ❖ ❖ Grid g = GridGain.start(<optional configuration>); default configuration file config/default-config.xml 4 Configuration ❖ ❖ GridGain environment configuration ❖ nodes discovery type ❖ specify IP address range for discovery ❖ etc. two methods ❖ programmatic config GridConfiguration class ❖ Spring XML config ❖ command line argument for ggstart.sh ❖ GridGain.start() parameter 5 Characteristics ❖ All nodes have the same role ❖ There is no Management Node - preventing from single point of failure ❖ By default data is stored in distributed structures in memory ❖ default behavior is to work as In-Memory DataGrid ❖ GridGain can work with a layer containing HDFS ❖ data replication - in case of node failure, auto backup on other nodes 6 GridGain Facade ❖ Class Grid - default API for cluster implementation ❖ // Create new configuration. GridConfiguration cfg = new GridConfiguration(); example: // Provide lifecycle bean to configuration. cfg.setLifecycleBeans(new MyGridLifecycleBean()); Grid grid = GridGain.start(); ❖ access to cluster items ❖ cluster management ❖ // Start grid with given configuration. Grid grid = GridGain.start(cfg) start, restart, stop etc. 7 GridGain Facade ❖ Class GridProjection - Grid Subinterface ❖ access to group of nodes ❖ select nodes based on certain rules ❖ nodes filtering // Get projection with random node out of remote nodes. GridProjection randomNode = grid.forRemotes().forRandom(); // Get projection with nodes residing on the same host with random node. GridProjection hostNodes = grid.forHost(randomNode.node()); // Get projection of all nodes that have "worker" attribute defined and // have current CPU load less than 50%. GridProjection predicateNodes = grid.forPredicate(new GridPredicate<GridNode>() { @Override public boolean apply(GridNode n) { return n.attribute("worker") != null && n.metrics().getCurrentCpuLoad() < 0.5; } }); 8 Nodes ❖ One physical machine is not limited to one GridGain node instance 9 Node Discovery ❖ Multicast Based Discovery ❖ Static IP Based Discovery ❖ Multicast and Static IP Based Discovery ❖ Amazon S3 Based Discovery 10 Multicast ❖ GridTcpDiscoveryMulticastIpFinder uses Multicast to discover other nodes in the grid and is the default IP finder. You should not have to specify it unless you plan to override default settings. Here is an example of how to configure this finder via Spring XML file: <bean class="org.gridgain.grid.GridConfiguration"> ... <property name="discoverySpi"> <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.multicast.GridTcpDiscoveryMultic astIpFinder"> <property name="multicastGroup" value="228.10.10.157"/> </bean> </property> </bean> </property> </bean> 11 Static IP ❖ For cases when Multicast is disabled, GridTcpDiscoveryVmIpFinder should be used with preconfigured list of IP addresses. You are only required to provide at least one IP address, but usually it is advisable to provide 2 or 3 addresses of the grid nodes that you plan to start first for redundancy. Once a connection to any of the provided IP addresses is established, GridGain will automatically discover all other grid nodes. <bean class="org.gridgain.grid.GridConfiguration"> ... <property name="discoverySpi"> <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.GridTcpDiscoveryVmIpFinder"> <property name="addresses"> <list> <value>1.2.3.4:47500..47509</value> <!-- IP address and port range --> <value>10.10.10.9:47500</value> </list> </property> </bean> </property> </bean> </property> </bean> 12 Multicast and Static IP ❖ You can use both, Multicast and Static IP based discovery together. In this case, in addition to addresses received via multicast, if any, GridTcpDiscoveryMulticastIpFinder can also work with preconfigured list of static IP addresses, just like Static IP-Based Discovery described above. Here is an example of how to configure Multicast IP finder with static IP addresses: <bean class="org.gridgain.grid.GridConfiguration"> ... <property name="discoverySpi"> <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.multicast.GridTcpDiscoveryMulticastIpFinder"> <property name="multicastGroup" value="228.10.10.157"/> <!-- list of static IP addresses--> <property name="addresses"> <list> <value>1.2.3.4:47500..47509</value> <!-- IP address and port range --> <value>10.10.10.9:47500</value> </list> </property> </bean> </property> </bean> </property> </bean> 13 Amazon S3 ❖ In Amazon AWS cloud, where Multicast is disabled, GridGain supports automatic node discovery by utilizing S3 store via GridTcpDiscoveryS3IpFinder. On startup nodes register their IP addresses with Amazon S3 store. This way other nodes can try to connect to any of the IP addresses stored in S3 and initiate automatic grid node discovery. <bean class="org.gridgain.grid.GridConfiguration"> ... <property name="discoverySpi"> <bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi"> <property name="ipFinder"> <bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.s3.GridTcpDiscoveryS3IpFinder"> <property name="awsCredentials" ref="aws.creds"/> <property name="bucketName" value="YOUR_BUCKET_NAME_IP_FINDER"/> </bean> </property> </bean> </property> </bean> <!-- AWS credentials. Provide your access key ID and secret access key. --> <bean id="aws.creds" class="com.amazonaws.auth.BasicAWSCredentials"> <constructor-arg value="YOUR_ACCESS_KEY_ID" /> <constructor-arg value="YOUR_SECRET_ACCESS_KEY" /> </bean> 14 How to connect with cluster ? ❖ Create a Java project ❖ use GridGain class ❖ GridGain have to be started with start() method ❖ instantiate GridGain.grid() object ❖ implement computation 15 Distributed Jobs ❖ Several methods for jobs execution using GridCompute class ❖ GridCompute defines compute grid functionality for executing tasks and closures over nodes in the GridProjection. Instance of GridCompute is obtained from grid projection as follows: ? 1 GridCompute c = GridGain.grid().compute(); The methods are grouped as follows: ◦ apply(...) methods execute GridClosure jobs over nodes in the projection. ◦ call(...) methods execute Callable jobs over nodes in the projection. Use GridCallable for better performance as it implements Serializable. ◦ run(...) methods execute Runnable jobs over nodes in the projection. Use GridRunnable for better performance as it implements Serializable. ◦ broadcast(...) methods broadcast jobs to all nodes in the projection. ◦ affinity(...) methods colocate jobs with nodes on which a specified key is cached. 16 Example Job Runnable Collection<GridFuture> futs = new ArrayList<>(); // Iterate through all words in the sentence and create callable jobs. for (final String word : "Print words using runnable".split(" ")) { // Execute runnable on some node. futs.add(grid.compute().run(new GridRunnable() { @Override public void run() { System.out.println(">>> Printing '" + word + "' on this node from grid job."); } })); } 17 MapReduce ❖ Map ❖ ❖ Result ❖ ❖ This method instantiates the jobs and maps them to worker nodes. The method receives the grid projection on which the task is run and the task argument. The method should return a map with jobs as keys and mapped worker nodes as values. The jobs are then sent to the mapped nodes and executed there. This method is called each time a job is completed on a worker node. The first argument is a result returned by a completed job. The second argument holds the list of already received results. The method should return a GridComputeJobResultPolicy instance, indicating what to do next: ❖ WAIT - wait for all remaining jobs to complete (if any) ❖ REDUCE - immediately move to reduce step, discarding all the remaining jobs and unreceived yet results ❖ FAILOVER - failover the job to another node (see Fault Tolerance) Reduce ❖ This method is called on reduce step, when all jobs have completed (or REDUCE result policy is returned from result()). The method receives a list with all the collected results and should return a final result of the computation. 18 MapReduce example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 // Execute task on the grid and wait for its completion. int cnt = grid.compute().execute(CharacterCountTask.class, "Hello Grid Enabled World!").get(); System.out.println(">>> Total number of characters in the phrase is '" + cnt + "'."); /** * Task to count non-white-space characters in a phrase. */ private static class CharacterCountTask extends GridComputeTaskAdapter<String, Integer> { // Splits the received string to words, creates a child job for each word, and sends // these jobs to other nodes for processing. Each such job simply prints out the received word. @Override public Map<? extends GridComputeJob, GridNode> map(List<GridNode> subgrid, String arg) { String[] words = arg.split(" "); Map<GridComputeJob, GridNode> map = new HashMap<>(words.length); Iterator<GridNode> it = subgrid.iterator(); for (final String word : arg.split(" ")) { // If we used all nodes, restart the iterator. if (!it.hasNext()) it = subgrid.iterator(); GridNode node = it.next(); map.put(new GridComputeJobAdapter() { @Nullable @Override public Object execute() { System.out.println(">>> Printing '" + word + "' on this node from grid job."); // Return number of letters in the word. return word.length(); } }, node); } } } return map; @Nullable @Override public Integer reduce(List<GridComputeJobResult> results) { int sum = 0; for (GridComputeJobResult res : results) sum += res.<Integer>getData(); return sum; } 19 Client Node ❖ ❖ Two types of data stored in the grid ❖ REPLICATED - some data stored on all grid nodes ❖ PARTITIONED - data distributed between participating nodes Client Node ❖ does not store any data - node configured using GridCacheDistributionMode.CLIENT_ONLY ❖ can be used as entry node to cluster 20 ❖ Two main versions: ❖ Open Source ❖ Enterprise ❖ http://hazelcast.org ❖ http://hazelcast.com/products/ 21 Install ❖ Download, unpack, use ❖ http://hazelcast.org/download/ 22 Setup ❖ configure client project with Maven <dependencies> <dependency> <groupId>com.hazelcast</groupId> <artifactId>hazelcast</artifactId> <version>3.4</version> </dependency> </dependencies> ❖ start cluster with ❖ bin/server.sh update with: ❖ JAVA_HOME=<your java home> ❖ HAZELCAST_HOME=<your directory with hazelcast> 23 Configuration ❖ two methods ❖ programmatic with Config class ❖ configuration file hazelcast.xml ❖ for basic usage default file should be enough 24 Characteristics ❖ All nodes have the same role ❖ There is no Management Node - preventing from single point of failure ❖ By default data is stored in distributed structures in memory ❖ ❖ default behavior is to work as In-Memory DataGrid data replication - in case of node failure, auto backup on other nodes 25 Node Discovery ❖ Multicast Based Discovery ❖ Discovery by TCP/IP ❖ Amazon S3 Based Discovery 26 Multicast ❖ With the multicast auto-discovery mechanism, Hazelcast allows cluster members to find each other using multicast communication. ❖ The cluster members do not need to know concrete addresses of each other, they just multicast to everyone for listening. ❖ It depends on your environment if multicast is possible or allowed. 27 TCP/IP ❖ If multicast is not the preferred way of discovery for your environment, then you can configure Hazelcast to be a full TCP/IP cluster. ❖ When configuring the Hazelcast for discovery by TCP/ IP, you must list all or a subset of the nodes’ hostnames and/or IP addresses. ❖ Note that you do not have to list all cluster members, but at least one of them has to be active in the cluster when a new member joins. 28 Amazon EC2 ❖ Hazelcast supports EC2 Auto Discovery. It is useful when you do not want or cannot provide the list of possible IP addresses. To configure your cluster to use EC2 Auto Discovery, disable join over multicast and TCP/IP, enable AWS, and provide your credentials (access and secret keys). ❖ You need to add hazelcast-cloud.jar dependency to your project. Note that it is also bundled inside hazelcast-all.jar. The Hazelcast cloud module does not depend on any other third party modules. 29 Distributed Data Structures ❖ Map: The distributed implementation of java.util.Map lets you read from and write to a Hazelcast map with methods like get and put. ❖ Queue: The distributed queue is an implementation of java.util.concurrent.BlockingQueue. You can add an item in one machine and remove it from another one. ❖ Set: The distributed and concurrent implementation of java.util.Set. It does not allow duplicate elements and does not preserve their order. ❖ List: Very similar to Hazelcast List, except that it allows duplicate elements and preserves their order. ❖ MultiMap: This is a specialized Hazelcast map. It is distributed, where multiple values under a single key can be stored. ❖ ReplicatedMap: This does not partition data, i.e. it does not spread data to different cluster members. Instead, it replicates the data to all nodes. 30 Data Replication ❖ If a member goes down, its backup replica (which holds the same data) will dynamically redistribute the data, including the ownership and locks on them, to the remaining live nodes. As a result, no data will be lost. ❖ There is no single cluster master that can cause single point of failure. Every node in the cluster has equal rights and responsibilities. No single node is superior. There is no dependency on an external ‘server’ or ‘master’. 31 Data Backup ❖ Configuration <hazelcast> <map name="default"> <backup-count>1</backup-count> </map> </hazelcast> ❖ backup-count 0 - no backups ❖ backup-count 6 - maximum number of backups ❖ higher backup-count decreases overall cluster performance 32 Map Eviction ❖ Unless you delete the map entries manually or use an eviction policy, they will remain in the map. Hazelcast supports policy based eviction for distributed maps. Currently supported policies are LRU (Least Recently Used) and LFU (Least Frequently Used). 33 <hazelcast> <map name="default"> ... <time-to-live-seconds>0</time-to-live-seconds> <max-idle-seconds>0</max-idle-seconds> <eviction-policy>LRU</eviction-policy> <max-size policy="PER_NODE">5000</max-size> <eviction-percentage>25</eviction-percentage> ... </map> </hazelcast> Map Persistence ❖ Hazelcast allows you to load and store the distributed map entries from/to a persistent data store such as a relational database. To do this, you can use Hazelcast’s MapStore and MapLoader interfaces. public class PersonMapStore implements MapStore<Long, Person> { private final Connection con; public PersonMapStore() { try { con = DriverManager.getConnection("jdbc:hsqldb:mydatabase", "SA", ""); con.createStatement().executeUpdate( "create table if not exists person (id bigint, name varchar(45))"); } catch (SQLException e) { throw new RuntimeException(e); } } public synchronized void delete(Long key) { System.out.println("Delete:" + key); try { con.createStatement().executeUpdate( format("delete from person where id = %s", key)); } catch (SQLException e) { throw new RuntimeException(e); } } } 34 Entry Listener ❖ You can listen to map entry events. Hazelcast distributed map offers the method addEntryListener to add an entry listener to the map. public class Listen { public static void main( String[] args ) { HazelcastInstance hz = Hazelcast.newHazelcastInstance(); IMap<String, String> map = hz.getMap( "somemap" ); map.addEntryListener( new MyEntryListener(), true ); System.out.println( "EntryListener registered" ); } static class MyEntryListener implements EntryListener<String, String> { @Override public void entryAdded( EntryEvent<String, String> event ) { System.out.println( "Entry Added:" + event ); } 35 Interceptors ❖ Interceptors are different from listeners. ❖ ❖ With listeners, you take an action after the operation has been completed. Interceptor actions are synchronous and you can alter the behavior of operation, change the values, or totally cancel it. 36 Interceptors Example Interceptor: Example Usage: public interface MapInterceptor extends Serializable { /** * Intercept the get operation before it returns a value. * Return another object to change the return value of get(..) * Returning null will cause the get(..) operation to return the original value, * namely return null if you do not want to change anything. * * * @param value the original value to be returned as the result of get(..) operation * @return the new value that will be returned by get(..) operation */ public class InterceptorTest { @Test public void testMapInterceptor() throws InterruptedException { HazelcastInstance hazelcastInstance1 = Hazelcast.newHazelcastInstance(); HazelcastInstance hazelcastInstance2 = Hazelcast.newHazelcastInstance(); IMap<Object, Object> map = hazelcastInstance1.getMap( "testMapInterceptor" ); SimpleInterceptor interceptor = new SimpleInterceptor(); Object interceptGet( Object value ); /** * Called after get(..) operation is completed. * * * @param value the value returned as the result of get(..) operation */ map.addInterceptor( interceptor ); map.put( 1, "New York" ); map.put( 2, "Istanbul" ); void afterGet( Object value ); 37 Distributed Jobs Execution ❖ The distributed executor service is a distributed implementation of java.util.concurrent.ExecutorService. ❖ You can have Hazelcast execute your code (Runnable, Callable); ❖ on a specific cluster member you choose, ❖ on the member owning the key you choose, ❖ on the member Hazelcast will pick, and ❖ on all or subset of the cluster members. 38 Distributed Jobs Execution ❖ on the member owning the key you choose, public void echoOnTheMemberOwningTheKey( String input, Object key ) throws Exception { Callable<String> task = new Echo( input ); HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(); IExecutorService executorService = hazelcastInstance.getExecutorService( "default" ); Future<String> future = executorService.submitToKeyOwner( task, key ); String echoResult = future.get(); } 39 Distributed Query ❖ Requested predicate is sent to each member in the cluster. ❖ Each member looks at its own local entries and filters them according to the predicate. At this stage, key/value pairs of the entries are deserialized and then passed to the predicate. ❖ Then the predicate requester merges all the results come from each member into a single set. ❖ Hazelcast offers the following APIs for distributed query purposes: ❖ Criteria API ❖ Distributed SQL Query 40 Distributed Query Criteria API ❖ equal: checks if the result of an expression is equal to a given value. ❖ notEqual: checks if the result of an expression is not equal to a given value. ❖ instanceOf: checks if the result of an expression has a certain type. ❖ like: checks if the result of an expression matches some string pattern. % (percentage sign) is placeholder for many characters, (underscore) is placeholder for only one character. ❖ greaterThan: checks if the result of an expression is greater than a certain value. ❖ greaterEqual: checks if the result of an expression is greater or equal than a certain value. ❖ lessThan: checks if the result of an expression is less than a certain value. ❖ lessEqual: checks if the result of an expression is than than or equal to a certain value. ❖ between: checks if the result of an expression is between 2 values (this is inclusive). ❖ in: checks if the result of an expression is an element of a certain collection. ❖ isNot: checks if the result of an expression is false. ❖ regex: checks if the result of an expression matches some regular expression. 41 Distributed Query Criteria API Following data: IMap<String, Employee> map = hazelcastInstance.getMap( "employee" ); EntryObject e = new PredicateBuilder().getEntryObject(); Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) ); Set<Employee> employees = map.values( predicate ); Example: public Set<Person> getWithNameAndAge( String name, int age ) { Predicate namePredicate = Predicates.equal( "name", name ); Predicate agePredicate = Predicates.equal( "age", age ); Predicate predicate = Predicates.and( namePredicate, agePredicate ); return personMap.values( predicate ); } public Set<Person> getWithNameOrAge( String name, int age ) { Predicate namePredicate = Predicates.equal( "name", name ); Predicate agePredicate = Predicates.equal( "age", age ); Predicate predicate = Predicates.or( namePredicate, agePredicate ); return personMap.values( predicate ); } 42 Distributed Query SQL Query Following data: IMap<Employee> map = hazelcastInstance.getMap( "employee" ); Example: Set<Employee> employees = map.values( new SqlPredicate( "active AND age < 30" ) ); 43 MapReduce Use Cases The best known examples for MapReduce algorithms are text processing tools, such as counting the word frequency in large texts or websites. Apart from that, there are more interesting examples of use cases listed below. • Log Analysis • Data Querying • Aggregation and summing • Distributed Sort • ETL (Extract Transform Load) • Credit and Risk management • Fraud detection • and more... 44 MapReduce Phases: ❖ map - input to key-value parse ❖ combine - optional but highly recommended - same keys are combined to lower the traffic (may be similar to reducer) ❖ grouping / shuffling - the same keys are sent to the same reducer ❖ reducer - final calculation 45 MapReduce Job: IMap<String, String> map = hazelcastInstance.getMap( "articles" ); KeyValueSource<String, String> source = KeyValueSource.fromMap( map ); Job<String, String> job = jobTracker.newJob( source ); ICompletableFuture<Map<String, Long>> future = job .mapper( new TokenizerMapper() ) .combiner( new WordCountCombinerFactory() ) .reducer( new WordCountReducerFactory() ) .submit(); // Attach a callback listener future.andThen( buildCallback() ); // Wait and retrieve the result Map<String, Long> result = future.get(); 46 MapReduce Mapper: public class TokenizerMapper implements Mapper<String, String, String, Long> { private static final Long ONE = Long.valueOf( 1L ); @Override public void map(String key, String document, Context<String, Long> context) { StringTokenizer tokenizer = new StringTokenizer( document.toLowerCase() ); while ( tokenizer.hasMoreTokens() ) { context.emit( tokenizer.nextToken(), ONE ); } }} 47 MapReduce Combiner: public class WordCountCombinerFactory implements CombinerFactory<String, Long, Long> { @Override public Combiner<Long, Long> newCombiner( String key ) { return new WordCountCombiner(); } private class WordCountCombiner extends Combiner<Long, Long> { private long sum = 0; @Override public void combine( Long value ) { sum++; } @Override public Long finalizeChunk() { return sum; } @Override public void reset() { sum = 0; } } } 48 Distributed Jobs Execution Reducer: public class WordCountReducerFactory implements ReducerFactory<String, Long, Long> { @Override public Reducer<Long, Long> newReducer( String key ) { return new WordCountReducer(); } private class WordCountReducer extends Reducer<Long, Long> { private volatile long sum = 0; @Override public void reduce( Long value ) { sum += value.longValue(); } @Override public Long finalizeReduce() { return sum; } } } 49 Links ❖ http://doc.gridgain.org/latest/Home ❖ http://docs.hazelcast.org/docs/3.3/manual/htmlsingle/hazelcast-documentation.html#preface 50
© Copyright 2024