HTTP Server Servers: Concurrency and Performance Jeff Chase Duke University Inside your server Example: Video On Demand Measures offered load response time throughput utilization Server application (Apache, Tomcat/Java, etc) • HTTP Server – Creates a socket (socket) – Binds to an address – Listens to setup accept backlog – Can call accept to block waiting for connections – (Can call select to check for data on multiple socks) • Handle request – GET /index.html HTTP/1.0\n <optional body, multiple lines>\n \n accept queue packet queues listen queue Client() { fd = connect(“server”); write (fd, “video.mpg”); while (!eof(fd)) { read (fd, buf); display (buf); } } Server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); } How many clients can the server support? Suppose, say, 200 kb/s video on a 100 Mb/s network link? [MIT/Morris] Performance “analysis” WebServer Flow • Server capacity: – Network (100 Mbit/s) – Disk (20 Mbyte/s) • Obtained performance: one client stream • Server is limited by software structure • If a video is 200 Kbit/s, server should be able to support more than one client. Create ServerSocket TCP socket space connSocket = accept() read request from connSocket close connSocket [MIT/Morris] state: established address: {128.36.232.5:6789, 198.69.10.10.1500} sendbuf: recvbuf: read local file write file to connSocket 500? 128.36.232.5 128.36.230.2 state: listening address: {*.6789, *.*} completed connection queue: sendbuf: recvbuf: state: listening address: {*.25, *.*} completed connection queue: sendbuf: recvbuf: Discussion: what does step do and how long does it take? 1 Web Server Processing Steps Process States and Transitions running (user) Accept Client Connection may block waiting on network Read HTTP Request Header Find File interrupt, exception may block waiting on disk I/O Yield running Sleep Send HTTP Response Header Read File Send Data trap/return blocked (kernel) Wakeup Run ready Want to be able to process requests concurrently. Server Blocking Under the Hood • accept() when no connect requests are waiting on the listen queue – What if server has multiple ports to listen from? • E.g., 80 for HTTP, 443 for HTTPS • open/read/write on server files • read() on a socket, if the client is sending too slowly • write() on socket, if the client is receiving too slowly – Yup, TCP has flow control like pipes What if the server blocks while serving one client, and another client has work to do? Concurrency and Pipelining CPU DISK Before NET CPU DISK NET After start (arrival rate λ) CPU I/O completion I/O request exit I/O device (throughput λ until some center saturates) Better single-server performance • Goal: run at server’s hardware speed – Disk or network should be bottleneck • Method: – Pipeline blocks of each request – Multiplex requests from multiple clients • Two implementation approaches: – Multithreaded server – Asynchronous I/O [MIT/Morris] 2 Multiple Process Architecture Concurrent threads or processes Process 1 Accept Conn Accept Conn • Example: a Multi-threaded WebServer, which creates a thread for each request Using Threads Find File Read File Send Data Send Header Read File Send Data … Send Header Thread N Accept Conn • • Read Request Find File Advantages – Lower context switch overheads – Shared address space simplifies optimizations (e.g., caches) Disadvantages – Need kernel level threads (why?) – Some extra memory needed to support multiple stacks – Need thread-safe programs, synchronization Multithreaded server server() { while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) { read(fd, block); write (cfd, block); } close (cfd); close (fd); }} Read File Send Data separate address spaces Read Request Find File Send Header Read File Send Data Advantages – Simple programming while addressing blocking issue Disadvantages – Many processes; large context switch overheads – Consumes much memory – Optimizations involving sharing information among processes (e.g., caching) harder Threads Thread 1 Read Request Send Header Process N • Accept Conn Find File Read Request … • Using multiple threads/processes – so that only the flow processing a particular request is blocked – Java: extends Thread or implements Runnable interface • A thread is a schedulable stream of control. • defined by CPU register values (PC, SP) • suspend: save register values in memory • resume: restore registers from memory • Multiple threads can execute independently: • They can run in parallel on multiple CPUs... – - physical concurrency • …or arbitrarily interleaved on a single CPU. – - logical concurrency • Each thread must have its own stack. Event-Driven Programming for (i = 0; i < 10; i++) threadfork (server); • When waiting for I/O, thread scheduler runs another thread • What about references to shared data? • Synchronization [MIT/Morris] • One execution stream: no CPU concurrency. • Register interest in events (callbacks). • Event loop waits for events, invokes handlers. • No preemption of event handlers. • Handlers generally shortlived. Event Loop Event Handlers [Ousterhout 1995] 3 Single Process Event Driven (SPED) Accept Conn Read Request Find File Send Header Asynchronous Multi-Process Event Driven (AMPED) Accept Conn Read Request Find File Send Header Read File Send Data Read File Send Data Event Dispatcher Event Dispatcher Helper 1 • Single threaded • Asynchronous (non-blocking) I/O • Advantages – Single address space – No synchronization • Disadvantages – In practice, disk reads still block • • • • Helper 1 Helper 1 Like SPED, but use helper processes/thread for disk I/O Use IPC to communicate with helper process Advantages – Shared address space for most web server functions – Concurrency for disk I/O Disadvantages – IPC between main thread and helper threads This hybrid model is used by the “Flash” web server. Event-Based Concurrent Servers Using I/O Multiplexing • Maintain a pool of connected descriptors. • Repeat the following forever: – Use the Unix select function to block until: • (a) New connection request arrives on the listening descriptor. • (b) New data arrives on an existing connected descriptor. – If (a), add the new connection to the pool of connections. – If (b), read any available data from the connection • Close connection on EOF and remove it from the pool. Select • If a server has many open sockets, how does it know when one of them is ready for I/O? int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); • Issues with scalability: alternative event interfaces have been offered. [CMU 15-213] Asychronous server Asynchronous I/O struct callback { bool (*is_ready)(); void (*cb)(arg); void *arg; } main() { while (1) { for (c = each callback) { if (c->is_ready()) c->handler(c->arg); } } } • Code is structured as a collection of handlers • Handlers are nonblocking • Create new handlers for blocking operations • When operation completes, call handler [MIT/Morris] init() { on_accept(accept_cb); } accept_cb() { on_readable(cfd,name_cb); } on_readable(fd, fn) { c = new callback(test_readable, fn, fd); add c to callback list; } name_cb(cfd) { read(cfd,name); fd = open(name); on_readable(fd, read_cb); } read_cb(cfd, fd) { read(fd, block); on_writeeable(fd, write_cb); } write_cb(cfd, fd) { write(cfd, block); on_readable(fd, read_cb); } [MIT/Morris] 4 Multithreaded vs. Async • • • • • Hard to program – Locking code – Need to know what blocks Coordination explicit State stored on thread’s stack – Memory allocation implicit Context switch may be expensive Multiprocessors • • • • • Coordination example Hard to program – Callback code – Need to know what blocks Coordination implicit State passed around explicitly – Memory allocation explicit Lightweight context switch Uniprocessors • Threaded server: – Thread for network interface – Interrupt wakes up network thread – Protected (locks and conditional variables) shared buffer shared between server threads and network thread • Asynchronous I/O – Poll for packets • How often to poll? – Or, interrupt generates an event • Be careful: disable interrupts when manipulating callback queue. [MIT/Morris] [MIT/Morris] One View Should You Abandon Threads? • No: important for high-end servers (e.g. databases). • But, avoid threads wherever possible: Event-Driven Handlers – Use events, not threads, for GUIs, distributed systems, low-end servers. – Only use threads where true CPU Threaded Kernel concurrency is needed. – Where threads needed, isolate usage in threaded application kernel: keep most of code single-threaded. Threads! [Ousterhout 1995] Another view • Events obscure control flow – For programmers and tools Threads thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } Events AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } ... ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } Control Flow Web Server Accept Conn. Threads Read Request Pin Cache • Events obscure control flow – For programmers and tools Read File Write Response Exit [von Behren] thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } Events CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); } ... ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } Web Server Accept Conn. Read Request Pin Cache Read File Write Response Exit [von Behren] 5 Exceptions State Management • Exceptions complicate control flow – Harder to understand program flow – Cause bugs in cleanup code Threads thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } Web Server Accept Conn. Events CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); } ... ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } • Events require manual state management • Hard to know when to free – Use GC or risk bugs Threads Read Request Pin Cache Read File Write Response Exit [von Behren] thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } Web Server Accept Conn. Events CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); } ... ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } Read Request Pin Cache Read File Write Response Exit [von Behren] Internet Growth and Scale Thread 1 Accept Conn Read Request Find File Read File Send Data Send Header Read File Send Data … Send Header The Internet Thread N Accept Conn Read Request Find File How to handle all those client requests raining on your server? Response Time Components • Wire time + • Queuing time + • Service demand + • Wire time (response) Performance Ideal Peak: some resource at max Depends on • Cost/length of request • Load conditions at server Overload: some resource thrashing latency Servers Under Stress offered load Load (concurrent requests, or arrival rate) [Von Behren] 6 Utilization Queuing Theory for Busy People • Big Assumptions – Queue is First-Come-First-Served (FIFO, FCFS). – Request arrivals are independent (poisson arrivals). – Requests have independent service demands. – i.e., arrival interval and service demand are exponentially distributed (noted as “M”). • What is the probability that the center is busy? – Answer: some number between 0 and 1. • What percentage of the time is the center busy? – Answer: some number between 0 and 100 • These are interchangeable: called utilization U • If the center is not saturated, i.e., it completes all its requests in some bounded time, then: • U = λD = (arrivals/T * service demand) • “Utilization Law” • The probability that the service center is idle is 1-U. Little’s Law Inverse Idle Time “Law” offered load request stream @ arrival rate λ wait here Process for mean service demand D M/M/1 Service Center • For an unsaturated queue in steady state, mean response time R and mean queue length N are governed by: Service center saturates as 1/ λ approaches D: small increases in λ cause large increases in the expected response time R. R Little’s Law: N = λR U • Suppose a task T is in the system for R time units. • During that time: – λR new tasks arrive. – N tasks depart (all tasks ahead of T). • But in steady state, the flow in balances flow out. – Note: this means that throughput X = λ. 1(100%) Little’s Law gives response time R = D/(1 - U). Intuitively, each task T’s response time R = D + DN. Substituting λR for N: R = D + D λR Substituting U for λD: R = D + UR R - UR = D --> R(1 - U) = D --> R = D/(1 - U) Under the Hood What does this tell us about server behavior at saturation? start (arrival rate λ) CPU I/O completion I/O request exit I/O device (throughput λ until some center saturates) 7 Common Bottlenecks • • • • • Scaling Server Sites: Clustering No more File Descriptors Sockets stuck in TIME_WAIT High Memory Use (swapping) CPU Overload Interrupt (IRQ) Overload Goals server load balancing failure detection access control filtering priorities/QoS request locality transparent caching L4: TCP L7: HTTP SSL etc. Clients virtual IP addresses (VIPs) smart switch server array What to switch/filter on? L3 source IP and/or VIP L4 (TCP) ports etc. L7 URLs and/or cookies L7 SSL session IDs [Aaron Bannert] Scaling Services: Replication Site A Distribute service load across multiple sites. How to select a server site for each client or request? Site B Extra Slides ? Internet (Any new information on the following slides will not be tested.) Is it scalable? Client Event-Based Concurrent Servers Using I/O Multiplexing • Maintain a pool of connected descriptors. • Repeat the following forever: – Use the Unix select function to block until: • (a) New connection request arrives on the listening descriptor. • (b) New data arrives on an existing connected descriptor. – If (a), add the new connection to the pool of connections. – If (b), read any available data from the connection • Close connection on EOF and remove it from the pool. Problems of Multi-Thread Server • High resource usage, context switch overhead, contended locks • Too many threads → throughput meltdown, response time explosion • Solution: bound total number of threads [CMU 15-213] 8 Event-Driven Programming Traditional Processes • • • • • • • • Event-driven programming, also called asynchronous i/o Using Finite State Machines (FSM) to monitor the progress of requests Yields efficient and scalable concurrency Many examples: Click router, Flash web server, TP Monitors, etc. • Java: asynchronous i/o – for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/ Events • • • • • • • • Need async I/O Need select Wasn’t originally available Not standardized Immature But efficient Code is distributed all through the program Harder to debug and understand Expensive and “heavyweight” One system call per process Fork overhead Coordination Threads • • • • • Separate interface and implementation Pthreads interface Implementation is user-level or kernel (native) If user-level, needs async I/O But hide the abstraction behind the thread interface Reference The State of the Art in Locally Distributed Webserver Systems Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S. Yu 9
© Copyright 2024