bean - DMCS Pages for Students

Cloud Programming on Java
EE Platforms
mgr inż. Piotr Nowak
❖
Two main versions:
❖
Open Source
❖
Enterprise
❖
http://www.gridgain.com
❖
http://www.gridgain.com/products/in-memory-datafabric/editions/
2
Install
❖
Download, unpack, use
❖
❖
www.gridgain.com/products/in-memory-datafabric/editions/
export GRIDGAIN_HOME=<GridGain_install_dir>
3
Setup
❖
configure client project with Maven
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>gridgain-fabric</artifactId>
<version>${version}</version>
<type>pom</type>
</dependency>
❖
start cluster with
❖
$ bin/ggstart.sh <optional configuration>
❖
Maven project
❖
❖
Grid g = GridGain.start(<optional configuration>);
default configuration file
config/default-config.xml
4
Configuration
❖
❖
GridGain environment configuration
❖
nodes discovery type
❖
specify IP address range for discovery
❖
etc.
two methods
❖
programmatic config GridConfiguration class
❖
Spring XML config
❖
command line argument for ggstart.sh
❖
GridGain.start() parameter
5
Characteristics
❖
All nodes have the same role
❖
There is no Management Node - preventing from single
point of failure
❖
By default data is stored in distributed structures in memory
❖
default behavior is to work as In-Memory DataGrid
❖
GridGain can work with a layer containing HDFS
❖
data replication - in case of node failure, auto backup on
other nodes
6
GridGain Facade
❖
Class Grid - default API
for cluster implementation
❖
// Create new configuration.
GridConfiguration cfg = new
GridConfiguration();
example:
// Provide lifecycle bean to
configuration.
cfg.setLifecycleBeans(new
MyGridLifecycleBean());
Grid grid =
GridGain.start();
❖
access to cluster items
❖
cluster management
❖
// Start grid with given
configuration.
Grid grid = GridGain.start(cfg)
start, restart, stop etc.
7
GridGain Facade
❖
Class GridProjection
- Grid Subinterface
❖
access to group of
nodes
❖
select nodes
based on certain
rules
❖
nodes filtering
// Get projection with random node out of remote nodes.
GridProjection randomNode = grid.forRemotes().forRandom();
// Get projection with nodes residing on the same host with
random node.
GridProjection hostNodes = grid.forHost(randomNode.node());
// Get projection of all nodes that have "worker" attribute
defined and
// have current CPU load less than 50%.
GridProjection predicateNodes = grid.forPredicate(new
GridPredicate<GridNode>() {
@Override public boolean apply(GridNode n) {
return n.attribute("worker") != null &&
n.metrics().getCurrentCpuLoad() < 0.5;
}
});
8
Nodes
❖
One physical machine is not limited to one GridGain
node instance
9
Node Discovery
❖
Multicast Based Discovery
❖
Static IP Based Discovery
❖
Multicast and Static IP Based Discovery
❖
Amazon S3 Based Discovery
10
Multicast
❖
GridTcpDiscoveryMulticastIpFinder uses Multicast to discover other nodes in the
grid and is the default IP finder. You should not have to specify it unless you plan to
override default settings. Here is an example of how to configure this finder via
Spring XML file:
<bean class="org.gridgain.grid.GridConfiguration">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean
class="org.gridgain.grid.spi.discovery.tcp.ipfinder.multicast.GridTcpDiscoveryMultic
astIpFinder">
<property name="multicastGroup" value="228.10.10.157"/>
</bean>
</property>
</bean>
</property>
</bean>
11
Static IP
❖
For cases when Multicast is disabled, GridTcpDiscoveryVmIpFinder should be used with preconfigured list of IP addresses. You are only required to provide at least one IP address, but usually it is
advisable to provide 2 or 3 addresses of the grid nodes that you plan to start first for redundancy. Once a
connection to any of the provided IP addresses is established, GridGain will automatically discover all
other grid nodes.
<bean class="org.gridgain.grid.GridConfiguration">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.vm.GridTcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>1.2.3.4:47500..47509</value> <!-- IP address and port range -->
<value>10.10.10.9:47500</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
12
Multicast and Static IP
❖
You can use both, Multicast and Static IP based discovery together. In this case, in addition to
addresses received via multicast, if any, GridTcpDiscoveryMulticastIpFinder can also work with preconfigured list of static IP addresses, just like Static IP-Based Discovery described above. Here is an
example of how to configure Multicast IP finder with static IP addresses:
<bean class="org.gridgain.grid.GridConfiguration">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.multicast.GridTcpDiscoveryMulticastIpFinder">
<property name="multicastGroup" value="228.10.10.157"/>
<!-- list of static IP addresses-->
<property name="addresses">
<list>
<value>1.2.3.4:47500..47509</value> <!-- IP address and port range -->
<value>10.10.10.9:47500</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
13
Amazon S3
❖
In Amazon AWS cloud, where Multicast is disabled, GridGain supports automatic node
discovery by utilizing S3 store via GridTcpDiscoveryS3IpFinder. On startup nodes register
their IP addresses with Amazon S3 store. This way other nodes can try to connect to any of
the IP addresses stored in S3 and initiate automatic grid node discovery.
<bean class="org.gridgain.grid.GridConfiguration">
...
<property name="discoverySpi">
<bean class="org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.gridgain.grid.spi.discovery.tcp.ipfinder.s3.GridTcpDiscoveryS3IpFinder">
<property name="awsCredentials" ref="aws.creds"/>
<property name="bucketName" value="YOUR_BUCKET_NAME_IP_FINDER"/>
</bean>
</property>
</bean>
</property>
</bean>
<!-- AWS credentials. Provide your access key ID and secret access key. -->
<bean id="aws.creds" class="com.amazonaws.auth.BasicAWSCredentials">
<constructor-arg value="YOUR_ACCESS_KEY_ID" />
<constructor-arg value="YOUR_SECRET_ACCESS_KEY" />
</bean>
14
How to connect
with cluster ?
❖
Create a Java project
❖
use GridGain class
❖
GridGain have to be started with start() method
❖
instantiate GridGain.grid() object
❖
implement computation
15
Distributed Jobs
❖
Several methods for jobs execution using GridCompute
class
❖
GridCompute defines compute grid functionality for
executing tasks and closures over nodes in the
GridProjection. Instance of GridCompute is
obtained from grid projection as follows:
?
1
GridCompute c = GridGain.grid().compute();
The methods are grouped as follows:
◦ apply(...) methods execute GridClosure jobs over nodes in the projection.
◦ call(...) methods execute Callable jobs over nodes in the projection. Use GridCallable for better performance as it implements Serializable.
◦ run(...) methods execute Runnable jobs over nodes in the projection. Use GridRunnable for better performance as it implements Serializable.
◦ broadcast(...) methods broadcast jobs to all nodes in the projection.
◦ affinity(...) methods colocate jobs with nodes on which a specified key is cached.
16
Example Job Runnable
Collection<GridFuture> futs = new ArrayList<>();
// Iterate through all words in the sentence and create callable jobs.
for (final String word : "Print words using runnable".split(" ")) {
// Execute runnable on some node.
futs.add(grid.compute().run(new GridRunnable() {
@Override public void run() {
System.out.println(">>> Printing '" + word + "' on this node from grid job.");
}
}));
}
17
MapReduce
❖
Map
❖
❖
Result
❖
❖
This method instantiates the jobs and maps them to worker nodes. The method receives the grid projection on
which the task is run and the task argument. The method should return a map with jobs as keys and
mapped worker nodes as values. The jobs are then sent to the mapped nodes and executed there.
This method is called each time a job is completed on a worker node. The first argument is a result returned by
a completed job. The second argument holds the list of already received results. The method should return
a GridComputeJobResultPolicy instance, indicating what to do next:
❖
WAIT - wait for all remaining jobs to complete (if any)
❖
REDUCE - immediately move to reduce step, discarding all the remaining jobs and unreceived yet results
❖
FAILOVER - failover the job to another node (see Fault Tolerance)
Reduce
❖
This method is called on reduce step, when all jobs have completed (or REDUCE result policy is returned
from result()). The method receives a list with all the collected results and should return a final result of the
computation. 18
MapReduce example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Execute task on the grid and wait for its completion.
int cnt = grid.compute().execute(CharacterCountTask.class, "Hello Grid Enabled World!").get();
System.out.println(">>> Total number of characters in the phrase is '" + cnt + "'.");
/**
* Task to count non-white-space characters in a phrase.
*/
private static class CharacterCountTask extends GridComputeTaskAdapter<String, Integer> {
// Splits the received string to words, creates a child job for each word, and sends
// these jobs to other nodes for processing. Each such job simply prints out the received word.
@Override
public Map<? extends GridComputeJob, GridNode> map(List<GridNode> subgrid, String arg) {
String[] words = arg.split(" ");
Map<GridComputeJob, GridNode> map = new HashMap<>(words.length);
Iterator<GridNode> it = subgrid.iterator();
for (final String word : arg.split(" ")) {
// If we used all nodes, restart the iterator.
if (!it.hasNext())
it = subgrid.iterator();
GridNode node = it.next();
map.put(new GridComputeJobAdapter() {
@Nullable @Override public Object execute() {
System.out.println(">>> Printing '" + word + "' on this node from grid job.");
// Return number of letters in the word.
return word.length();
}
}, node);
}
}
}
return map;
@Nullable @Override
public Integer reduce(List<GridComputeJobResult> results) {
int sum = 0;
for (GridComputeJobResult res : results)
sum += res.<Integer>getData();
return sum;
}
19
Client Node
❖
❖
Two types of data stored in the grid
❖
REPLICATED - some data stored on all grid nodes
❖
PARTITIONED - data distributed between participating
nodes
Client Node
❖
does not store any data - node configured using
GridCacheDistributionMode.CLIENT_ONLY
❖
can be used as entry node to cluster
20
❖
Two main versions:
❖
Open Source
❖
Enterprise
❖
http://hazelcast.org
❖
http://hazelcast.com/products/
21
Install
❖
Download, unpack, use
❖
http://hazelcast.org/download/
22
Setup
❖
configure client project with Maven
<dependencies>
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast</artifactId>
<version>3.4</version>
</dependency>
</dependencies>
❖
start cluster with
❖
bin/server.sh update with:
❖
JAVA_HOME=<your java home>
❖
HAZELCAST_HOME=<your directory with hazelcast>
23
Configuration
❖
two methods
❖
programmatic with Config class
❖
configuration file hazelcast.xml
❖
for basic usage default file should be enough
24
Characteristics
❖
All nodes have the same role
❖
There is no Management Node - preventing from single
point of failure
❖
By default data is stored in distributed structures in
memory
❖
❖
default behavior is to work as In-Memory DataGrid
data replication - in case of node failure, auto backup on
other nodes
25
Node Discovery
❖
Multicast Based Discovery
❖
Discovery by TCP/IP
❖
Amazon S3 Based Discovery
26
Multicast
❖
With the multicast auto-discovery mechanism,
Hazelcast allows cluster members to find each other
using multicast communication.
❖
The cluster members do not need to know concrete
addresses of each other, they just multicast to everyone
for listening.
❖
It depends on your environment if multicast is possible
or allowed.
27
TCP/IP
❖
If multicast is not the preferred way of discovery for
your environment, then you can configure Hazelcast to
be a full TCP/IP cluster.
❖
When configuring the Hazelcast for discovery by TCP/
IP, you must list all or a subset of the nodes’ hostnames
and/or IP addresses.
❖
Note that you do not have to list all cluster members, but
at least one of them has to be active in the cluster when a
new member joins.
28
Amazon EC2
❖
Hazelcast supports EC2 Auto Discovery. It is useful when you
do not want or cannot provide the list of possible IP addresses.
To configure your cluster to use EC2 Auto Discovery, disable
join over multicast and TCP/IP, enable AWS, and provide your
credentials (access and secret keys).
❖
You need to add hazelcast-cloud.jar dependency to your
project. Note that it is also bundled inside hazelcast-all.jar. The
Hazelcast cloud module does not depend on any other third
party modules.
29
Distributed
Data Structures
❖
Map: The distributed implementation of java.util.Map lets you read from and write to a
Hazelcast map with methods like get and put.
❖
Queue: The distributed queue is an implementation of
java.util.concurrent.BlockingQueue. You can add an item in one machine and remove it
from another one.
❖
Set: The distributed and concurrent implementation of java.util.Set. It does not allow
duplicate elements and does not preserve their order.
❖
List: Very similar to Hazelcast List, except that it allows duplicate elements and
preserves their order.
❖
MultiMap: This is a specialized Hazelcast map. It is distributed, where multiple values
under a single key can be stored.
❖
ReplicatedMap: This does not partition data, i.e. it does not spread data to different
cluster members. Instead, it replicates the data to all nodes.
30
Data Replication
❖
If a member goes down, its backup replica (which holds
the same data) will dynamically redistribute the data,
including the ownership and locks on them, to the
remaining live nodes. As a result, no data will be lost.
❖
There is no single cluster master that can cause single
point of failure. Every node in the cluster has equal
rights and responsibilities. No single node is superior.
There is no dependency on an external ‘server’ or
‘master’.
31
Data Backup
❖
Configuration
<hazelcast>
<map name="default">
<backup-count>1</backup-count> </map>
</hazelcast>
❖
backup-count 0 - no backups
❖
backup-count 6 - maximum number of backups
❖
higher backup-count decreases overall cluster
performance
32
Map Eviction
❖
Unless you delete the map
entries manually or use an
eviction policy, they will
remain in the map. Hazelcast
supports policy based
eviction for distributed
maps. Currently supported
policies are LRU (Least
Recently Used) and LFU
(Least Frequently Used).
33
<hazelcast>
<map name="default">
...
<time-to-live-seconds>0</time-to-live-seconds>
<max-idle-seconds>0</max-idle-seconds>
<eviction-policy>LRU</eviction-policy>
<max-size policy="PER_NODE">5000</max-size>
<eviction-percentage>25</eviction-percentage>
...
</map>
</hazelcast>
Map Persistence
❖
Hazelcast allows you to
load and store the
distributed map entries
from/to a persistent data
store such as a relational
database. To do this, you
can use Hazelcast’s
MapStore and MapLoader
interfaces.
public class PersonMapStore implements MapStore<Long, Person> {
private final Connection con;
public PersonMapStore() { try {
con = DriverManager.getConnection("jdbc:hsqldb:mydatabase", "SA", "");
con.createStatement().executeUpdate(
"create table if not exists person (id bigint, name varchar(45))"); } catch (SQLException e) {
throw new RuntimeException(e); }
}
public synchronized void delete(Long key) { System.out.println("Delete:" + key); try {
con.createStatement().executeUpdate(
format("delete from person where id = %s", key));
} catch (SQLException e) {
throw new RuntimeException(e); }
}
}
34
Entry Listener
❖
You can listen to
map entry events.
Hazelcast
distributed map
offers the method
addEntryListener
to add an entry
listener to the map.
public class Listen {
public static void main( String[] args ) {
HazelcastInstance hz = Hazelcast.newHazelcastInstance();
IMap<String, String> map = hz.getMap( "somemap" );
map.addEntryListener( new MyEntryListener(), true );
System.out.println( "EntryListener registered" );
}
static class MyEntryListener implements EntryListener<String, String>
{ @Override
public void entryAdded( EntryEvent<String, String> event ) {
System.out.println( "Entry Added:" + event );
}
35
Interceptors
❖
Interceptors are different from listeners.
❖
❖
With listeners, you take an action after the operation
has been completed.
Interceptor actions are synchronous and you can alter
the behavior of operation, change the values, or totally
cancel it.
36
Interceptors
Example Interceptor:
Example Usage:
public interface MapInterceptor extends Serializable {
/**
* Intercept the get operation before it returns a value.
* Return another object to change the return value of get(..)
* Returning null will cause the get(..) operation to return the original value,
* namely return null if you do not want to change anything.
*
*
* @param value the original value to be returned as the result of get(..)
operation * @return the new value that will be returned by get(..) operation
*/
public class InterceptorTest {
@Test
public void testMapInterceptor() throws InterruptedException {
HazelcastInstance hazelcastInstance1 = Hazelcast.newHazelcastInstance();
HazelcastInstance hazelcastInstance2 = Hazelcast.newHazelcastInstance();
IMap<Object, Object> map =
hazelcastInstance1.getMap( "testMapInterceptor" );
SimpleInterceptor interceptor = new SimpleInterceptor();
Object interceptGet( Object value );
/**
* Called after get(..) operation is completed.
*
*
* @param value the value returned as the result of get(..) operation */
map.addInterceptor( interceptor );
map.put( 1, "New York" );
map.put( 2, "Istanbul" );
void afterGet( Object value );
37
Distributed Jobs
Execution
❖
The distributed executor service is a distributed
implementation of java.util.concurrent.ExecutorService.
❖
You can have Hazelcast execute your code (Runnable,
Callable);
❖
on a specific cluster member you choose,
❖
on the member owning the key you choose,
❖
on the member Hazelcast will pick, and
❖
on all or subset of the cluster members.
38
Distributed Jobs
Execution
❖
on the member owning the key you choose,
public void echoOnTheMemberOwningTheKey( String input, Object key ) throws Exception {
Callable<String> task = new Echo( input );
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IExecutorService executorService = hazelcastInstance.getExecutorService( "default" );
Future<String> future = executorService.submitToKeyOwner( task, key );
String echoResult = future.get();
}
39
Distributed Query
❖
Requested predicate is sent to each member in the cluster.
❖
Each member looks at its own local entries and filters them
according to the predicate. At this stage, key/value pairs of the
entries are deserialized and then passed to the predicate.
❖
Then the predicate requester merges all the results come from each
member into a single set.
❖
Hazelcast offers the following APIs for distributed query purposes:
❖
Criteria API
❖
Distributed SQL Query
40
Distributed Query
Criteria API
❖
equal: checks if the result of an expression is equal to a given value.
❖
notEqual: checks if the result of an expression is not equal to a given value.
❖
instanceOf: checks if the result of an expression has a certain type.
❖
like: checks if the result of an expression matches some string pattern. % (percentage sign) is placeholder for many characters, (underscore) is placeholder for only one character.
❖
greaterThan: checks if the result of an expression is greater than a certain value.
❖
greaterEqual: checks if the result of an expression is greater or equal than a certain value.
❖
lessThan: checks if the result of an expression is less than a certain value.
❖
lessEqual: checks if the result of an expression is than than or equal to a certain value.
❖
between: checks if the result of an expression is between 2 values (this is inclusive).
❖
in: checks if the result of an expression is an element of a certain collection.
❖
isNot: checks if the result of an expression is false.
❖
regex: checks if the result of an expression matches some regular expression. 41
Distributed Query
Criteria API
Following data:
IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
EntryObject e = new PredicateBuilder().getEntryObject();
Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) );
Set<Employee> employees = map.values( predicate );
Example:
public Set<Person> getWithNameAndAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.and( namePredicate, agePredicate );
return personMap.values( predicate );
}
public Set<Person> getWithNameOrAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.or( namePredicate, agePredicate );
return personMap.values( predicate );
}
42
Distributed Query
SQL Query
Following data:
IMap<Employee> map = hazelcastInstance.getMap( "employee" );
Example:
Set<Employee> employees = map.values( new SqlPredicate( "active AND age < 30" ) );
43
MapReduce
Use Cases
The best known examples for MapReduce algorithms are text processing tools, such as
counting the word frequency in large texts or websites. Apart from that, there are more
interesting examples of use cases listed below.
• Log Analysis
• Data Querying
• Aggregation and summing
• Distributed Sort
• ETL (Extract Transform Load)
• Credit and Risk management
• Fraud detection
• and more...
44
MapReduce
Phases:
❖
map - input to key-value parse
❖
combine - optional but highly recommended - same keys
are combined to lower the traffic (may be similar to
reducer)
❖
grouping / shuffling - the same keys are sent to the same
reducer
❖
reducer - final calculation
45
MapReduce
Job:
IMap<String, String> map = hazelcastInstance.getMap( "articles" );
KeyValueSource<String, String> source = KeyValueSource.fromMap( map );
Job<String, String> job = jobTracker.newJob( source );
ICompletableFuture<Map<String, Long>> future = job .mapper( new TokenizerMapper() )
.combiner( new WordCountCombinerFactory() ) .reducer( new WordCountReducerFactory() ) .submit();
// Attach a callback listener
future.andThen( buildCallback() ); // Wait and retrieve the result
Map<String, Long> result = future.get();
46
MapReduce
Mapper:
public class TokenizerMapper implements Mapper<String, String, String, Long> {
private static final Long ONE = Long.valueOf( 1L ); @Override
public void map(String key, String document, Context<String, Long> context) {
StringTokenizer tokenizer = new StringTokenizer( document.toLowerCase() );
while ( tokenizer.hasMoreTokens() ) {
context.emit( tokenizer.nextToken(), ONE );
}
}}
47
MapReduce
Combiner:
public class WordCountCombinerFactory
implements CombinerFactory<String, Long, Long> {
@Override
public Combiner<Long, Long> newCombiner( String key ) {
return new WordCountCombiner();
}
private class WordCountCombiner extends Combiner<Long, Long> { private long sum = 0;
@Override
public void combine( Long value ) {
sum++;
}
@Override
public Long finalizeChunk() {
return sum;
}
@Override
public void reset() {
sum = 0;
}
}
}
48
Distributed Jobs
Execution
Reducer:
public class WordCountReducerFactory implements ReducerFactory<String, Long, Long> {
@Override
public Reducer<Long, Long> newReducer( String key ) {
return new WordCountReducer();
}
private class WordCountReducer extends Reducer<Long, Long> { private volatile long sum = 0;
@Override
public void reduce( Long value ) {
sum += value.longValue();
}
@Override
public Long finalizeReduce() {
return sum; }
}
}
49
Links
❖
http://doc.gridgain.org/latest/Home
❖
http://docs.hazelcast.org/docs/3.3/manual/htmlsingle/hazelcast-documentation.html#preface
50