GPU Computing Resources and Community at the University of Sheffield
by Dr Paul Richmond (University of Sheffield)
Get the Starting code from github by cloning the master branch of the CUDALab01 repository from the RSE-Sheffield github account. E.g.
git clone https://github.com/RSE-Sheffield/CUDALab01.git
This will check out all the starting code for you to work with.
Exercise 1 requires that we de-cipher some encrypted text. The text provided in the file
encrypted01.bin has been encrypted by using an affine cipher. The affine cypher is a very simple type of monoalphabetic substitution cypher where each numerical character of the alphabet is encrypted using a mathematical function. The encryption function is defined as;
Where and are keys of the cypher, mod is the modulo operation and and are co-prime. For this exercise the value of is
27 and is
128 (the size of the ASCII alphabet). The affine decryption function is defined as:
Where is the modular multiplicative inverse of . For this exercise has a value of
Note: The mod operation is not the same as the remainder operator (
%) for negative numbers. A suitable mod function has been provided for the example.
As each of the encrypted character values are independent we can use the GPU to decrypt them in parallel. To do this we will launch a thread for each of the encrypted character values and use a kernel function to perform the decryption. Starting from the code provided in
exercise01.cu, complete the following;
affine_decryptkernel. Run the program as a using
qsubby modifying the
1024). The function should store the result in
d_output. You can define the inverse modulus
Musing a C pre-processor definition.
d_input) and output (
h_inputto the device memory
Nthreads and launch the
d_outputto the host memory
affine_decrypt_multiblockkernel which should work when using multiple blocks of threads. Change your grid and block dimensions so that you launch
In exercise 2 we are going to extend the vector addition example from the lecture. The file
exercise02.cu has been provided as a starting point. Perform the following modifications.
vectorAddCPU) storing the result in an array called
c_ref. Implement a new function
validatewhich compares the GPU result to the CPU result. It should print an error for each value which is incorrect and return a value indicating the total number of errors. You should also print the number of errors to the console. Now fix the error and confirm your error check code works.
2050. Do not run your code yet as it will now perform unsafe writes beyond the memory bounds which you have allocated. This is because a whole thread block is required for the extra two threads (our grid is always made upt of entire blocks). You should modify the kernel by adding a check in the kernel so that you do not write beyond the bounds of the allocated memory. This will require you the ensure that the threads unique position that it indexed into memory does not exceed
N. Threads which fail this test should no nothing.
We are going to implement a matrix addition kernel. In matrix addition, two matrices of the same dimensions are added entry wise. If you modify your code from exercise 2 by copying the file to a new file called
exercise03.cu. It will require the following changes;
sizeso that you allocate enough memory for a matrix size of
N x Nand moves the correct amount of data using
random_intsfunction to generate a random matrix rather than a vector.
matrixAddCPUand update the validate function.
256threads per block. Create a new kernel (
matrixAdd) to perform the matrix addition. Hint: You might find it helps to reduce
Nto a single thread block to test your code.
N x Mfor any size.
The exercise solutions are available from the solution branch of the repository. To check these out either clone the repository using the branch command to a new directory as follows;
git clone -b solutions https://github.com/RSE-Sheffield/CUDALab01.git
Alternately commit your changes and switch branch
git commit -m “my local changes to src files” git checkout solutions
You will need to commit your local changes to avoid overwriting your changes when switching to the solutions branch. You can then return to your modified versions by checking out the master branch.