Java for High Performance Computing java nio High

NIO: New I/O u Prior to the J 2 SE 1. 4 release of

References u The Java NIO software is part of J 2 SE 1. 4

Buffers u u A Buffer object is a container for a fixed amount of

The java. nio. Buffer Hierarchy dbcarpen@indiana. edu 6

The Byte. Buffer Class u u The most important buffer class in practice is

File Position and Limit u u Apart from forms with an index parameter, these

Creating Buffers u Four interesting factory methods can be used to create a new

Other Primitive Types in Byte. Buffer’s u It is possible to write other primitive

Endian-ness u When identifying a numeric type like int or double with a sequence

View Buffers u u Byte. Buffer has no methods for bulk transfer of arrays

Channels u A channel is a new abstraction in java. nio. – In the

Simplified Channel Hierarchy Some of the “inheritance” arcs here are indirect: we missed out

Opening Channels u Socket channel classes have static factory methods called open(), e. g.

Using Channels u Any channel that implements the Byte. Channel interface—i. e. all channels

Example: Copying one Channel to Another u This example assumes a source channel src

Memory-Mapped Files u In modern operating systems one can exploit the virtual memory system

Scatter/Gather u Often called vectored I/O, this just means you can pass an array

Socket. Channels u As mentioned at the beginning of this section, socket channels are

Basic Socket Channel Operations u Typical use of a server socket channel follows a

Nonblocking Operations u By calling the method socket. configure. Blocking(false) ; u you put

Interruptible Operations u u The standard channels in NIO are all interruptible. If a

Other Features of Channels u u u File channels provide a quite general file

Readiness Selection u Prior to New I/O, Java provided no standard way of selecting—from

Classes Involved in Selection u u u Selection can be done on any channel

Setting Up Selectors u u A selector is created by the open() factory method.

Example u Here we create a selector, and register three pre-existing channels to the

select() and the Selected Key Set u To inspect the set of channels, to

Ready Sets u u This is quite complicated already, but there is one more

A Pattern for Using select() … register some channels with selector … while(true) {

Remarks u This general pattern will probably serve for most uses of select(): 1.

Key Attachments u One problem with the pattern above is that when it. next()

Simplistic Use of Key Attachments channel 1. register (selector, Selection. Key. OP_READ, new Integer(1)

Conclusion u We briefly visited several topics in New I/O that are likely to

Slides: 38

Download presentation

Java for High Performance Computing java. nio: High Performance I/O for Java http: //www. hpjava. org/courses/arl Instructor: Bryan Carpenter Pervasive Technology Labs Indiana University dbcarpen@indiana. edu 1

NIO: New I/O u Prior to the J 2 SE 1. 4 release of Java, I/O had become a bottleneck. – JIT performance was reaching the point where one could start to think of Java as a platform for High Performance computation, but the old java. io stream classes had too many software layers to be fast—the specification implied much copying of small chunks of data; there was no way to multiplex data from multiple sources without incurring thread context switches; also there was no way to exploit modern OS tricks for high performance I/O, like memory mapped files. u New I/O changes that by providing: – A hierarchy of dedicated buffer classes that allow data to be moved from the JVM to the OS with minimal memory-to-memory copying, and without expensive overheads like switching byte order; effectively buffer classes give Java a “window” on system memory. – A unified family of channel classes that allow data to be fed directly from buffers to files and sockets, without going through the intermediaries of the old stream classes. – A family of classes to directly implement selection (AKA readiness testing, AKA multiplexing) over a set of channels. – NIO also provides file locking for the first time in Java. dbcarpen@indiana. edu 2

References u The Java NIO software is part of J 2 SE 1. 4 and later, from http: //java. sun. com/j 2 se/1. 4 u Online documentation is at: http: //java. sun. com/j 2 se/1. 4/nio u There is an authoritative book from O’Reilly: “Java NIO”, Ron Hitchens, 2002 dbcarpen@indiana. edu 3

Buffers dbcarpen@indiana. edu 4

Buffers u u A Buffer object is a container for a fixed amount of data. It behaves something like a byte [] array, but is encapsulated in such a way that the internal storage can be a block of system memory. – Thus adding data to, or extracting it from, a buffer can be a very direct way of getting information between a Java program and the underlying operating system. – All modern OS’s provide virtual memory systems that allow memory space to be mapped to files, so this also enables a very direct and highperformance route to the file system. – The data in a buffer can also be efficiently read from, or written to, a socket or pipe, enabling high performance communication. u The buffer APIs allow you to read or write from a specific location in the buffer directly; they also allow relative reads and writes, similar to sequential file access. dbcarpen@indiana. edu 5

The java. nio. Buffer Hierarchy dbcarpen@indiana. edu 6

The Byte. Buffer Class u u The most important buffer class in practice is probably the Byte. Buffer class. This represents a fixed-size vector of primitive bytes. Important methods on this class include: byte get() byte get(int index) Byte. Buffer get(byte [] dst, int offset, int length) Byte. Buffer put(byte b) Byte. Buffer put(int index, byte b) Byte. Buffer put(byte [] src, int offset, int length) Byte. Buffer put(Byte. Buffer src) dbcarpen@indiana. edu 7

File Position and Limit u u Apart from forms with an index parameter, these are all relative operations: they get data from, or insert data into, the buffer starting at the current position in the buffer; they also update the position to point to the position after the read or written data. The position property is like the file pointer in sequential file access. The superclass Buffer has methods for explicitly manipulating the position and related properties of buffers, e. g: int position() Buffer position(int new. Position) int limit() Buffer limit(int new. Limit) – The Byte. Buffer or Buffer references returned by these various methods are simply references to this buffer object, not new buffers. They are provided to support cryptic invocation chaining. Feel free to ignore them. u The limit property defines either the last space available for writing, or how much data has been written to the file. – After finishing writing a flip() method can be called to set limit to the current value of position, and reset position to zero, ready for reading. u Various operations implicitly work on the data between position and limit. dbcarpen@indiana. edu 8

Creating Buffers u Four interesting factory methods can be used to create a new Byte. Buffer: Byte. Buffer allocate(int capacity) Byte. Buffer allocate. Direct(int capacity) Byte. Buffer wrap(byte [] array, int offset, length) These are all static methods of the Byte. Buffer class. – allocate() creates a Byte. Buffer with an ordinary Java backing array of size capacity. – allocate. Direct()—perhaps the most interesting case—creates a direct Byte. Buffer, backed by capacity bytes of system memory. – The wrap() methods create Byte. Buffer’s backed by all or part of an array allocated by the user. u The other typed buffer classes (Char. Buffer, etc) have similar factory methods, except they don’t support the important allocate. Direct() method. dbcarpen@indiana. edu 9

Other Primitive Types in Byte. Buffer’s u It is possible to write other primitive types (char, int, double, etc) to a Byte. Buffer by methods like: Byte. Buffer put. Char(char value) Byte. Buffer put. Char(int index, char value) Byte. Buffer put. Int(int index, int value) … The put. Char() methods do absolute or relative writes of the two bytes in a Java char, the put. Int() methods write 4 bytes, and so on. – Of course there are corresponding get. Char(), get. Int(), … methods. u u u These give you fun, unsafe ways of coercing bytes of one primitive type to another type, by writing data as one type and reading them as another. But actually this isn’t the interesting bit—this was always possible with the old java. io Data. Stream’s. The interesting bit is that the new Byte. Buffer class has a method that allows you to set the byte order… dbcarpen@indiana. edu 10

Endian-ness u When identifying a numeric type like int or double with a sequence of bytes in memory, one can either put the most significant byte first (big-endian), or the least significant byte first (little-endian). – Big Endian: Sun Sparc, Power. PC CPU, numeric fields in IP headers, … – Little Endian: Intel processors u In java. io, numeric types were always rendered to stream in big-endian order. – Creates a serious bottleneck when writing or reading numeric types. – Implementations typically must apply byte manipulation code to each item, to ensure bytes are written in the correct order. u In java. nio, the programmer specifies the byte order as a property of a Byte. Buffer, by calling one of: my. Buffer. order(Byte. Order. BIG_ENDIAN) my. Buffer. order(Byte. Order. LITTLE_ENDIAN) my. Buffer. order(Byte. Order. native. Order()) u Provided the programmer ensures the byte order set for the buffer agrees with the native representation for the local processor, numeric data can be copied between JVM (which will use the native order) and buffer by a straight block memory copy, which can be extremely fast—a big win for NIO. dbcarpen@indiana. edu 11

View Buffers u u Byte. Buffer has no methods for bulk transfer of arrays other than type byte[]. Instead, create a view of (a portion of) a Byte. Buffer as any other kind of typed buffer, then use the bulk transfer methods on that view. Following methods of Byte. Buffer create views: Char. Buffer as. Char. Buffer() Int. Buffer as. Int. Buffer() … – To create a view of just a portion of a Byte. Buffer, set position and limit appropriately beforehand—the created view only covers the region between these. – You cannot create views of typed buffers other than Byte. Buffer. – You can create another buffer that represents a subsection of any buffer (without changing element type) by using the slice() method. u For example, writing an array of floats to a byte buffer, starting at the current position: float [] array ; … Float. Buffer float. Buf = byte. Buf. as. Float. Buffer() ; float. Buf. put(array) ; dbcarpen@indiana. edu 12

Channels dbcarpen@indiana. edu 13

Channels u A channel is a new abstraction in java. nio. – In the package java. nio. channels. u Channels are a high-level version of the file-descriptors familiar from POSIX-compliant operating systems. – So a channel is a handle for performing I/O operations and various control operations on an open file or socket. u For those familiar with conventional Java I/O, java. nio associates a channel with any Random. Access. File, File. Input. Stream, File. Output. Stream, Socket, Server. Socket or Datagram. Socket object. – The channel becomes a peer to the conventional Java handle objects; the conventional objects still exist, and in general retain their role—the channel just provides extra NIO-specific functionality. u NIO buffer objects can written to or read from channels directly. Channels also play an essential role in readiness selection, discussed in the next section. dbcarpen@indiana. edu 14

Simplified Channel Hierarchy Some of the “inheritance” arcs here are indirect: we missed out some interesting intervening classes and interfaces. dbcarpen@indiana. edu 15

Opening Channels u Socket channel classes have static factory methods called open(), e. g. : Socket. Channel sc = Socket. Channel. open() ; Sc. connect(new Inet. Socket. Address(hostname, portnumber)) ; u File channels cannot be created directly; first use conventional Java I/O mechanisms to create a File. Input. Stream, File. Output. Stream, or Random. Access. File, then apply the new get. Channel() method to get an associated NIO channel, e. g. : Random. Access. File raf = new Random. Access. File(filename, “r”) ; File. Channel fc = raf. get. Channel() ; dbcarpen@indiana. edu 16

Using Channels u Any channel that implements the Byte. Channel interface—i. e. all channels except Server. Socket. Channel—provide a read() and a write() instance method: int read(Byte. Buffer dst) int write(Byte. Buffer src) – These may look reminiscent of the read() and write() system calls in UNIX: int read(int fd, void* buf, int count) int write(int fd, void* buf, int count) – The Java read() attempts to read from the channel as many bytes as there are remaining to be written in the dst buffer. Returns number of bytes actually read, or -1 if end-of-stream. Also updates dst buffer position. – Similarly write() attempts to write to the channel as many bytes as there are remaining in the src buffer. Returns number of bytes actually read, and updates src buffer position. dbcarpen@indiana. edu 17

Example: Copying one Channel to Another u This example assumes a source channel src and a destination channel dest: Byte. Buffer buffer = Byte. Buffer. allocate. Direct(BUF_SIZE) ; while(src. read(buffer) != -1) { buffer. flip() ; // Prepare read buffer for “draining” while(buffer. has. Remaining()) dest. write(buffer) ; buffer. clear() ; // Empty buffer, ready to read next chunk. } – Note a write() call (or a read() call) may or may not succeed in transferring whole buffer in a single call. Hence need for inner while loop. – Example introduces two new methods on Buffer: has. Remaining() returns true if position < limit; clear() sets position to 0 and limit to buffer’s capacity. – Because copying is a common operation on files, File. Channel provides a couple of special methods to do just this: long transfer. To(long position, long count, Writeable. Byte. Channel target) long transfer. From(Readable. Byte. Channel src, long position, long count) dbcarpen@indiana. edu 18

Memory-Mapped Files u In modern operating systems one can exploit the virtual memory system to map a physical file into a region of program memory. – Once the file is mapped, accesses to the file can be extremely fast: one doesn’t have to go through read() and write() system calls. – One application might be a Web Server, where you want to read a whole file quickly and send it to a socket. – Problems arise if the file structure is changed while it is mapped—use this technique only for fixed-size files. u This low-level optimization is now available in Java. File. Channel has a method: Mapped. Byte. Buffer map(Map. Mode mode, long position, long size) – mode should be one of Map. Mode. READ_ONLY, Map. Mode. READ_WRITE, Map. Mode. PRIVATE. – The returned Mapped. Byte. Buffer can be used wherever an ordinary Byte. Buffer can. dbcarpen@indiana. edu 19

Scatter/Gather u Often called vectored I/O, this just means you can pass an array of buffers to a read or write operation; the overloaded channel instance methods have signatures: long read(Byte. Buffer [] dsts) long read(Byte. Buffer [] dsts, int offset, int length) long write(Byte. Buffer [] srcs, int offset, int length) u u The first form of read() attempts to read enough data to fill all buffers in the array, and divides it between them, in order. The first form of write() attempts to concatenate the remaining data in all buffers and write it. – The arguments offset and length select a subset of buffers from the arrays (not, say, an interval within buffers). dbcarpen@indiana. edu 20

Socket. Channels u As mentioned at the beginning of this section, socket channels are created directly with their own factory methods – If you want to manage a socked connection as a NIO channel this is the only option. Creating NIO socket channel implicitly creates a peer java. net socket object, but (contrary to the situation with file handles) the converse is not true. u u As with file channels, socket channels can be more complicated to work with than the traditional java. net socket classes, but provide much of the hard-boiled flexibility you get programming sockets in C. The most notable new facilities are that now socket communications can be non-blocking, they can be interrupted, and there is a selection mechanism that allows a single thread to do multiplex servicing of any number of channels. dbcarpen@indiana. edu 21

Basic Socket Channel Operations u Typical use of a server socket channel follows a pattern like: Server. Socket. Channel ssc = Server. Socket. Channel. open() ; ssc. socket(). bind( new Inet. Socket. Address(port) ) ; while(true) { Socket. Channel sc = ssc. accept() ; } u … process a transaction with client through sc … The client does something like: Socket. Channel sc = Socket. Channel. open() ; sc. connect( new Inet. Socket. Addr(server. Name, port) ) ; … initiate a transaction with server through sc … u The elided code above will typically be using read() and write() calls on the Socket. Channel to exchange data between client and server. – So there are four important operations: accept(), connect(), write(), read(). dbcarpen@indiana. edu 22

Nonblocking Operations u By calling the method socket. configure. Blocking(false) ; u you put a socket into nonblocking mode (calling again with argument true restores to blocking mode, and so on). In non-blocking mode: – A read() operation only transfers data that is immediately available. If no data is immediately available it returns 0. – Similarly, if data cannot be immediately written to a socket, a write() operation will immediately return 0. – For a server socket, if no client is currently trying to connect, the accept() method immediately returns null. – The connect() method is more complicated—generally connections would always block for some interval waiting for the server to respond. » In non-blocking mode connect() generally returns false. But the negotiation with the server is nevertheless started. The finish. Connect() method on the same socket should be called later. It also returns immediately. Repeat until it return true. dbcarpen@indiana. edu 23

Interruptible Operations u u The standard channels in NIO are all interruptible. If a thread is blocked waiting on a channel, and the thread’s interrupt() method is called, the channel will be closed, and the thread will be woken and sent a Closed. By. Interrupt. Exception. – To avoid race conditions, the same will happen if an operation on a channel is attempted by a thread whose interrupt status is already true. – See the lecture on threads for a discussion of interrupts. u This represents progress over traditional Java I/O, where interruption of blocking operations was not guaranteed. dbcarpen@indiana. edu 24

Other Features of Channels u u u File channels provide a quite general file locking facility. This is presumably important to many applications (database applications), but less obviously so to HPC operations, so we don’t discuss it here. There is a Datagram. Channel for sending UDP–style messages. This may well be important for high performance communications, but we don’t have time to discuss it. There is a special channel implementation representing a kind of pipe, which can be used for inter-thread communication. dbcarpen@indiana. edu 25

Selectors dbcarpen@indiana. edu 26

Readiness Selection u Prior to New I/O, Java provided no standard way of selecting—from a set of possible socket operations—just the ones that are currently ready to proceed, so the ready operations can be immediately serviced. – One application would be in implementing an MPI-like message passing system: in general incoming messages from multiple peers must be consumed as they arrive and fed into a message queue, until the user program is ready to handle them. – Previously one could achieve equivalent effects in Java by doing blocking I/O operations in separate threads, then merging the results through Java thread synchronization. But this can be inefficient because thread context switching and synchronization is quite slow. u u One way of achieving the desired effect in New I/O would be set all the channels involved to non-blocking mode, and use a polling loop to wait until some are ready to proceed. A more structured—and potentially more efficient—approach is to use Selectors. – In many flavors of UNIX this is achieved by using the select() system call. dbcarpen@indiana. edu 27

Classes Involved in Selection u u u Selection can be done on any channel extending Selectable. Channel—amongst the standard channels this means the three kinds of socket channel. The class that supports the select() operation itself is Selector. This is a sort of container class for the set of channels in which we are interested. The last class involved is Selection. Key, which is said to represent the binding between a channel and a selector. – In some sense it is part of the internal representation of the Selector, but the NIO designers decided to make it an explicit part of the API. dbcarpen@indiana. edu 28

Setting Up Selectors u u A selector is created by the open() factory method. This is naturally a static method of the Selector class. A channel is added to a selector by calling the method: Selection. Key register(Selector sel, int ops) – This, slightly oddly, is an instance method of the Selectable. Channel class— you might have expected the register() method to be a member of Selector. – Here ops is a bit-set representing the interest set for this channel: composed by oring together one or more of: Selection. Key. OP_READ Selection. Key. OP_WRITE Selection. Key. OP_CONNECT Selection. Key. OP_ACCEPT – A channel added to a selector must be in nonblocking mode! u The register() method returns the Selection. Key created – Since this automatically gets stored in the Selector, so in most cases you probably don’t need to save the result yourself. dbcarpen@indiana. edu 29

Example u Here we create a selector, and register three pre-existing channels to the selector: Selector selector = Selector. open() ; channel 1. register (selector, Selection. Key. OP_READ) ; channel 2. register (selector, Selection. Key. OP_WRITE) ; channel 3. register (selector, Selection. Key. OP_READ | Selection. Key. OP_WRITE) ; – For channel 1 the interest set is reads only, for channel 2 it is writes only, for channel 3 it is reads and writes. u Note channel 1, channel 2, channel 3 must all be in nonblocking mode at this time, and must remain in that mode as long as they are registered in any selector. – You remove a channel from a selector by calling the cancel() method of the associated Selection. Key. dbcarpen@indiana. edu 30

select() and the Selected Key Set u To inspect the set of channels, to see what operations are newly ready to proceed, you call the select() method on the selector. – The return value is an integer, which will be zero if no status changes occurred. – More interesting than the return value is the side effect this method has on the set of selected keys embedded in the selector. u To use selectors, you must understand that a selector maintains a Set object representing this selected keys set. – Because each key is associated with a channel, this is equivalent to a set of selected channels. – The set of selected keys is different from (presumably a subset of) the registered key set. – Each time the select() method is called it may add new keys to the selected key set, as operations become ready to proceed. – You, as the programmer, are responsible for explicitly removing keys from the selected key set belonging to the selector, as you deal with operations that have become ready. dbcarpen@indiana. edu 31

Ready Sets u u This is quite complicated already, but there is one more complication. We saw that each key in the registered key set has an associated interest set, which is a subset of the 4 possible operations on sockets. Similarly each key in the selected key set has an associated ready set, which is a subset of the interest set—representing the actual operations that have been found ready to proceed. Besides adding new keys to the selected key set, a select() operation may add new operations to the ready set of a key already in the selected key set. – Assuming the selected key set was not cleared after a preceding select(). u You can extract the ready set from a Selection. Key as a bit-set, by using the method ready. Ops(). Or you can use the convenience methods: is. Readable() is. Writeable() is. Connectable() is. Acceptable() which effectively return the bits of the ready set individually. dbcarpen@indiana. edu 32

A Pattern for Using select() … register some channels with selector … while(true) { selector. select() ; Iterator it = selector. selected. Keys(). iterator() ; while( it. has. Next() ) { Selection. Key key = it. next() ; if( key. is. Readable() ) … perform read() operation on key. channel() … if( key. is. Writeable() ) … perform write() operation on key. channel() … if( key. is. Connectable() ) … perform connect() operation on key. channel() … if( key. is. Acceptable() ) … perform accept() operation on key. channel() … it. remove() ; } } dbcarpen@indiana. edu 33

Remarks u This general pattern will probably serve for most uses of select(): 1. Perform select() and extract the new selected key set 2. For each selected key, handle the actions in its ready set 3. Remove the processed key from the selected key set » Note the remove() operation on an Iterator removes the current item from the underlying container. u More generally, the code that handles a ready operation may also alter the set of channels registered with the selector – e. g after doing an accept() you may want to register the returned Socket. Channel with the selector, to wait for read() or write() operations. u In many cases only a subset of the possible operations read, write, accept, connect are ever in interest sets of keys registered with the selector, so you won’t need all 4 tests. dbcarpen@indiana. edu 34

Key Attachments u One problem with the pattern above is that when it. next() returns a key, there is no convenient way of getting information about the context in which the associated channel was registered with the selector. – For example channel 1 and channel 3 are both registered for OP_READ. But the action that should be taken when the read becomes ready may be quite different for the two channels. – You need a convenient way to determine which channel the returned key is bound to. u You can specify an arbitrary object as an attachment to the key when you create it; later when you get the key from the selected set, you can extract the attachment, and use its content in to decide what to do. – At its most basic the attachment might just be an index identifying the channel. dbcarpen@indiana. edu 35

Simplistic Use of Key Attachments channel 1. register (selector, Selection. Key. OP_READ, new Integer(1) ) ; // attachment … channel 3. register (selector, Selection. Key. OP_READ | Selection. Key. OP_WRITE, new Integer(3) ) ; // attachment … while(true) { … Iterator it = selector. selected. Keys(). iterator() ; … Selection. Key key = it. next() ; if( key. is. Readable() ) switch( ((Integer) key. channel(). attachment() ). value() ) { case 1 : … action appropriate to channel 1 … case 3 : … action appropriate to channel 3 … } dbcarpen@indiana. edu 36

Conclusion u We briefly visited several topics in New I/O that are likely to be interesting for HPC with Java. – Some topics that are less obviously relevant we skipped, like file locking, and regular expressions. – Also we didn’t cover datagram channels, which may well be relevant. u u New I/O has been widely hailed as an important step forward in getting serious performance out of the Java platform. See the paper: “MPJava: High-Performance Message Passing in Java using java. nio” William Pugh and Jaime Spacco For a good example of how New I/O may affect the “Java for HPC” landscape. dbcarpen@indiana. edu 37

dbcarpen@indiana. edu 38