Math 365 Grady Wright Homework 3 Due Oct. 9, 2014 1. Gaussian Elimination. (10 points) Solve the following linear system of equations by hand using Gaussian Elimination with partial pivoting. ( 2x +4x3 +3x4 =4 1 −2x1 +2x3 −13x4 = 40 x1 +15x2 +2x3 −4.5x4 = 29 −4x1 +5x2 −7x3 −10x4 =9 The process you use to solve the system should follow exactly the process shown in the lecture slides (except now with partial pivoting). You will not receive full credit if you deviate from the standard algorithm. Show all your steps, and describe in words what you are doing in each step in terms of elementary row operations. When you complete the problem, sit back and be thankful we have computers that can be programmed to solve these types of problems. √ 2. Statics. (7 points) NCM, problem 2.3. Please also explain where the factor α = 1/ 2 comes from in the set of equations. 3. Cholesky factorization. (8 points) NCM 2.5(a) 4. Timing a linear solve. (30 points) In this problem, you will investigate the cost of solving linear systems and compare your results to the theoretical computational cost we have discussed in class. Before starting this problem, you will need the following information about the laptop or PC you are working on. • The amount of RAM in your computer. Typical values are betweem 4 and 12 GB (gigabytes). • The clock speed of your PC or laptop. A typical value is 2.5 GHz (gigahertz). (a) Experimental setup. Determine a matrix size that you can comfortably fit into your available RAM. For example, if you have a 4 GB machine, you should be able to comfortably store a matrix that occupies about 100MB. Store this value in a variable “Mb”. Use the following information to compute a maximum matrix dimension N that you can store in Mb megabytes of memory. • • • • A megabyte has 1024 kilobytes A kilobyte is 1024 bytes A floating point number is 8 bytes. An N × N matrix contains N 2 floating point numbers. Call the N you compute ’nmax’ (Nmax ). (b) Calibrate the timing experiment. We need to “calibrate” our experimental set up by timing an operation whose flop count is easy to compute. Create two random matrices A and B each of size Nmax × Nmax . Using the Matlab functions tic and toc, determine how much time (seconds) it takes to compute the product AB. Determine the number of floating point operations (additions and multiplications) it takes to compute the Nmax × Nmax matrix-matrix product. Use this number to estimate the number of floating point operations per second (’flops’) your computer can carry out. Call this flop rate ’flops’. Compare this number to the theoretical number obtained using your computer’s clock speed. The following information might be useful. Figure 1: Timing plot for problem 3. The x-axis is the matrix size N (N × N matrix), and the y-axis is the time (in seconds) as measured by tic and toc for solving a linear system of size N . • A gigahertz is 109 hertz. • A hertz is equal to one clock cycle. • A typical microprocessor (CPU) can compute 4 flops per clock cycle. If you have a dual or quad core machine, you may also want to investigate whether Matlab is making use of multiple cores. (c) Time a dense linear solve. Create an integer sequence of values Nvec between, say, N = 100 and the Nmax you found above. To generate this sequence, it is a good idea to use the logspace command. Using a for loop, loop over the entries Ni of the sequence. In each pass through the loop, create a random matrix A of size Ni × Ni , and a random right hand side vector b of size Ni × 1. Then, using tic and toc, time how long it takes to solve the system Ax = b using the backslash operator. Store this time as the ith entry in a vector ’lutimes’. (d) On a log-log plot, plot the time values you found above (stored in lutimes) verses the Ni in your sequence of N values. On the same set of axis, plot the curve of the theoretical time estimated by the operation count we discussed in class. Be sure to use the flop rate ’flops’ you computed above to get a time (in seconds) from an operation count. The two plots should be very close. Be sure to label your plots. Use the Matlab legend command to identify each curve that you get. Your plot should look something like the figure shown in Figure 1. (e) (Extra Credit - 10 points) Visit the website http://www.top500.org and answer the following questions. • Using the value from the ’Linpack Benchmark’, what is the performance of the world’s fastest supercomputer, measured in GFlops (= 109 flops)? • How much faster is the world’s fastest computer than your computer or laptop? • How recently would your computer have made it on the Top 500 list of the world’s fastest computers? • When can we expect to see an exaflop machine? 2 The ’Linpack Benchmark’ solves a dense linear system of equations, very much like what you have done for this problem. 5. Countercurrent Exchangers. (30 points). Countercurrent exchangers occur both in nature and in industry. As an example, consider an n-stage countercurrent exchanger for extracting a chemical from a fluid as illustrated in the figure below. In this example, the fluid (such as water) contains a mass fraction xin of some chemical as it enters the extractor on the top left at a mass flow rate of W . At the same time, a solvent containing a mass fraction yin of the same chemical enters the extractor on the bottom left at a mass flow rate of S. At each stage of the reactor, the chemical in the fluid is removed and transferred to the solvent by some chemical process. Assuming no chemical is created or destroyed in any stage, we can use the conservation of mass (mass balance) to model this process. This yields the following set of linear equations for the mass fraction of the chemical in the solvent and fluid at any stage i. W xi−1 + Syi+1 {z } | = Total mass fraction entering stage i W xi + Syi | {z } , i = 2, 3, . . . , n − 1 Total mass fraction leaving stage i The equations for stage 1 and n are given as W xin + Sy2 = W x1 + Sy1 W xn−1 + Syin = W xn + Syn Suppose that at each stage yi is proportional to xi , i.e. yi = mxi , where m is some constant that depends on the target chemical. Then the above system of equations can be rewritten just in terms of xi , i = 1, . . . , n as −(W + Sm)x1 + Smx2 = −W xin , W xi−1 − (W + Sm)xi + Smxi+1 = 0 (i = 2, 3, 4, . . . , n − 1), W xn−1 − (W + Sm)xn = −Syin . Chemicallyrich fluid Chemicallyclean fluid Stage n Stage n-1 Stage i Stage 2 Stage 1 Chemicallyclean solvent Chemicallyrich solvent Figure 2: Schematic of the countercurrent exchanger for extracting a chemical from a fluid with mass fraction xin . (a) Write a Matlab function that computes the mass fraction of the chemical in the fluid stream at each of the n stages (i.e. xi , i = 1, 2, . . . , n). Your function should take as input W (in kg/hr), S (in kg/hr), xin , yin , m, and n and should output xi , i = 1, . . . , n. It should use the spdiags function in Matlab to form the coefficient matrix and then 3 use the backslash operator \ to solve the system. As output, the function should return both the mass fraction in the fluid and in the solvent. Your function should contain no explicit looping structures. Turn-in a print out of your code. (b) Write another Matlab function that performs the same task as part (a), but that uses the tridisolve function from the NCM book to solve the linear system instead of the spdiag function combined with the backslash operator. (c) Show that your functions from part (a) and (b) produce the same output by solving the n-stage countercurrent chemical extraction problem with W = 145 kg/hr, S = 27 kg/hr, xin = 0.15, yin = 0, m = 7, and n = 51. To compare the results of the two functions, compute the sum of the absolute value of the differences between the mass fraction values returned by both functions. (d) Using your function from either part (a) or (b), make an appropriate plot of the mass fraction vs. the stages of the extraction for the problem in part (c). Include the mass fraction of the chemical in the fluid and in the solvent on the same plot. An appropriate plot means that I can see precisely how the mass fraction decays as the water flows through the stages. (e) By what percentage has the original chemical in the fluid been reduced by at the end of the extraction procedure using the parameters from part (c). What is the mass fraction of the chemical in the fluid and the solvent at Stage 26? 6. Heat transfer.(30 points) Imagine that we have a metal rod of length L (m) whose temperature at each end is held fixed at 20C o (293.15 K) and which is being heated by a flame held at the midpoint. We can approximate the steady state temperature at equally spaced points along the rod using the following model. We divide the rod into N intervals of equal length h as shown in the figure below. This partitioning of the rod gives us N + 1 equally spaced points xj = hj, h= L , N j = 0, 1, 2, . . . N The steady state temperature Tj (in Kelvin) at interior points xj in the rod can be modeled as Tj+1 + Tj−1 + h2 f (xj ), j = 1, 2, . . . , N − 1, Tj = 2 where T0 = TN = 293.15K. The flame f (x) is given by 2 ! S x − L/2 f (x) = exp − 2ρcp β ε where • S is the heating rate for the flame (W/m3 ) • β is the thermal diffusivity of the metal (m2 /s) • cp is the heat capacity of the metal (J/(kg · K)) • ρ is the density of the metal (kg/m3 ) • ε is a scaling factor specifying the flame width (m) (a) Using either the tridisolve or \spdiags function in Matlab , set up a linear system to solve for the steady state temperature distribution in the rod of length L = 1 and N = 201. (b) Solve the system for a thermal diffusivity of β = 10−3 , heat rate S = 1 × 104 , heat capacity of cp = 200, density of ρ = 10 and a scaling factor ε = 5 × 10−2 . (c) Plot the resulting temperature distribution as a function of x. Be sure to add labels and a legend to your graph. (d) How close to the center of the rod could you put your hand with out getting burned? Give the value of x and indicate where this point is on the graph from the previous part. Note that you need to find the approximate temperature at which skin burns. 4 h L 20°= =20° Figure 3: Schematic for the heated metal rod. 7. Cholesky factorization and sparse matrices. (15 points) As we discussed in class and in problem 2.5 from the book, if A is a symmetric positive definite matrix then A can be decomposed into the form A = LLT , where L is a lower triangular matrix (or A = RT R, where R is and upper triangular matrix). This decomposition is called the Cholesky decomposition. For this problem you will use the Cholesky function built into Matlab (chol), which is a very sophisticated implemenation that efficiently handles the decomposition of a sparse symmetric positive definite matrix by exploiting any structure formed by the non-zero entries in the matrix. (a) Download the file “bcsstk36.mat” from the course web page and save it to your working directory in Matlab . This file contains a sparse matrix that arises from a structural analysis of an automobile shock absorber assembly1 . Load the matrix into Matlab using the following series of commands: >>load bcsstk36; >>A = Problem.A; Use the spy command in Matlab to generate a plot showing the sparsity pattern of the matrix A. Compute the sparsity ratio of the A matrix by comparing the number of non-zero entries in A (use the nnz command) to the total number of entries in A. Report this number along with some descriptive text in the title of the spy plot of A. (b) Compute the Cholesky decomposition of the matrix from part (a) using the “lower” option in the chol function (this is faster for sparse matrices than the default setting). Plot the sparsity pattern of the lower triangular matrix from the decomposition. Compute the amount of “fill-in” from the Cholesky decomposition. The “fill-in” can be defined as the ratio of the number of non-zero entries in the Cholesky decomposition to the number of non-zero entries in the original matrix. Report this number along with some descriptive text in the title of the spy plot of the lower triangular matrix. (c) What you would like to happen is for the “fill-in” from the Cholesky decomposition to be as small as possible since this translate directly into a more computationally efficient method for solving the underlying linear system involving the matrix A. For any given sparse symmetric positive definite matrix, the “fill-in” that occurs from the Cholesky decomposition depends on how the rows and columns of the matrix are ordered. Several algorithms exist for permuting the rows and columns to try and minimize the “fill-in”. All these algorithms are based on results from Graph Theory. One of the more popular (and better) of these algorithms is the Symmetric Approximate Minimum Degree Permutation method. Matlab contains a function for this method called symamd. Your goal is to use this function on the original matrix A from part (a) to compare the “fill-in” from the Cholesky decomposition of the permuted A matrix to the “fill-in” from the original one. Create plots for the sparsity pattern of the permuted A matrix and its Cholesky decomposition. In the latter include the “fill-in”. 1 see http://www.cise.ufl.edu/research/sparse/matrices/Boeing/bcsstk36.html 5