1. Computer Abstractions and Technology
1. Introduction
Personal Computers / Servers / Embedded computers >> personal mobile device 1) how are programs written in high-level language
2) what is interface between software and hardware, how SW instruct HW to perform needed functions
3) what determines the performance and impove it
4) techiniques can be used by HW designers to improve performance
5) techiniques to impove energy efficiency
6) consequence of switch from sequential processing to parrallel processing and multicore processors
2. Eight great ideas in computer tech
1) design for moore's law : resource available per chip can double every 18-24 month
2) use abstraction to simplify design :
3) make the common case first : enhance performance better than optimizing the rare case
4) performance via parrallelism : computer architects design gets more performance by performancing operations in parrallel
5) performance via pipelining :
6) performance vis prediction : assuming the mechanism to recover from a misprediction is not too expensive cause predictions are relatively accurate
7) hierarchy of memories : fastest, most expensive memory on the top and slowest, cheapest memory on the bottom of the hierarchy
8) dependability via redundancy : not only be the fastest, need to be dependable
3. Below your program
HW > System software > Application software
operating system : handling basic input and output operations / allocating storage and memory / provide protected sharing of the computer among multiple applications
compilers : translation between high-level language into instructinos for HW
assembly langauge > binary language = machine language > high-level programming language
4. Under the covers
inputting data / outputting data / processing data / storing data
input devices / output devices
liquid crystal displays
integrated circuit a.k.a chip
central processor unit a.k.a CPU
datapath : component of processor that performs arthmetic operations
control : part that command datapath, memory, I/O devices
memory
dynamic random access memory a.k.a DRAM
cache memory : small, fast memory act as buffer for slower memory
static random access memory a.k.a SRAM : faster and less dense than DRAM
instruction set architecture
application binary interface a.k.a ABI :
STORAGE
volatile memory : retain data only receiving power
nonvolatile memory main memory(primary memory)
secondary memory NETWORKS
communication : infomation exchanged
resource sharing : share I/O devices
nonlocal access : using computer far away
local area network LAN
wide area network WAN
MEMORY
transistor > integrated circuit into a chip
use silicon as semiconductor
wafers > always defects so when patterned wafer and diced, dies (chips) are made
dies contains flaws > yield (수율) defined as percentage of good dies from the total numbers of dies on the wafer Cost per die = (Cost per wafer) / (Dies per wafer * yield)
Dies per wafer = (Wafer area) / (Die area)
Yield = 1 / (1 + (Defects per area * Die area/2))^2 PERFORMANCE
reducing reponse time : the time between the start and completion of a task a.k.a execution time
or increase bandwidth a.k.a throughput
performance X = 1 / (Execution time X)
X is n times faster than Y >>
n = ( Performance X ) / (Performance Y)
= (Execution Time Y) / ( Execution Time X) MEASURING PERFORMANCE
CPU excution time (CPU time) is actual time the CPU spends computing for specific task
user CPU time / system CPU time
clock cycles : time for one clock period
clock period : length of each clock cycle CPU execution time for a program = (CPU clock cycles for a program) X (Clock cycle time) = (CPU clock cycle for a program) / (Clock rate)
CPU time A = (CPU clock cycles A) / (Clock rate A) CPU clock cycles = Instructions for a program X Average clock cycles per instruction
Clock cycles per instruction, CPI is the average number of clock cycles each instruction takes to execute. It provides one way of comparing two different implementations for the same instruction set architecture
CLASSIC CPU PERFORMANCE EQUATION
terms of instruction count ( excuted by the program), CPI and clock cyle time CPU time = Intruction count X CPI X Clock cycle time = ( Instruction count X CPI ) / Clock rate
POWER WALL
clock rate and power increased rapidly but flattened off recently > running into the practical power limit for cooling commodity microprocessors
Energy = Capacity load X Volatage ^2 Although dynamic energy is the primary source of energy consumption in CMOS, static energy consumption occurs because of leakage current that fl ows even when a transistor is off. In servers, leakage is typically responsible for 40% of the energy consumption. Thus, increasing the number of transistors increases power dissipation, even if the transistors are always off. A variety of design techniques and technology innovations are being deployed to control leakage, but it’s hard to lower voltage further.
Power is a challenge for integrated circuits for two reasons. First, power must be brought in and distributed around the chip; modern microprocessors use hundreds of pins just for power and ground! Similarly, multiple levels of chip interconnect are used solely for power and ground distribution to portions of the chip. Second, power is dissipated as heat and must be removed. Server chips can burn more than 100 watts, and cooling the chip and the surrounding system is a major expense in Warehouse Scale Computers (see Chapter 6).
UNIPROCESSOR TO MULTIPROCESSORS
Parallelism : between processor and microprocessors, companies calls processors as 'cores' and microprocessors are 'multicore microprocessors'. It 1) increase the difficulty of programming, 2) program should be divided to each processors same so overhead of scheduling and coordination doesn't fitter potential performance of parrallelism
REAL STUFF : BENCHMARKING
workload
benchmarks FALLCAIES AND PITFALLS
Amdahl's Law : the performance enhancement prossible with a given improvement is limited by the amount that the improved feature is used
Execution time after improvement = ( Execution time affected by improvement ) / (Amount of improvement ) + Execution time unaffected
we use Amdahl's law to estimate performance improvement when we know the time comsumed for some function and potential speedup
MIPS(million instructions per second)
= (Instruction count ) / (Execution time * 10^6)
specifies performance inversely to execution time
But 1) MIPS doesn't care capabilities of the instruction 2) MIPS varies between programs at same computer 3) computer can't have single MIPS rating
MIPS = (Intstruction count) / ( Instruction count * CPI * 10^6 )/ ( Clock rate ) = (Clock rate) / (CPI * 10^6 )