Squid Internet Object Cache - CSIE -NCKU

Web Cache Replacements
張燕光
資訊工程系
成功大學
[email protected]
Introduction
• Which page to be removed from its cache?
– Finding a replacement algorithm that can yield
high hit rate.
• Differences from traditional caching
– nonhomogeneity of the object sizes
– same frequency and different size, favor
smaller objects if consider only hit rate,
• Byte hit rate
2
Introduction
• Other consideration
– transfer time cost
– Expiration time
– Frequency
• Measurement metrics?
• admission control?
• When or how often to perform the
replacement operations?
• How many documents to remove?
3
Measurement Metrics
• Hit Rate (HR):
– % requests satisfied by cache
– (shows fraction of requests not sent to server)
• Volume measures:
– Weighted hit rate (WHR): Byte Hit Ratio
• % client-requested bytes returned by proxy (shows
fraction of bytes not sent by server)
– Fraction of packets not sent
– Reduction in distance traveled (e.g., hop count)
• Latency Time
4
Three Categories
• Traditional replacement policies and its
direct extensions:
– LRU, LFU, …
• Key-based replacement policies:
• Cost-based replacement policies:
5
Traditional replacement
• Least Recently Used (LRU) evicts the
object which was requested the least
recently
– prune off as many of the least recently used
objects as is necessary to have sufficient space
for the newly accessed object.
– This may involve zero, one, or many
replacements.
6
Traditional replacement
• Least Frequently used (LFU) evicts the
object which is accessed least frequently.
• Pitkow/Recker evicts objects in LRU order,
except if all objects are accessed within the
same day, in which case the largest one is
removed.
7
Key-based Replacement
• The idea in key-based policies is to sort
objects based upon a primary key, break ties
based on a secondary key, break remaining
ties based on a tertiary key, and so on.
8
Key-based Replacement
• LRUMIN:
– This policy is biased in favor of smaller sized
objects so as to minimize the number of objects
replaced.
– Let the size of the incoming object be S.
Suppose that this object will not fit in the cache.
• If there are any objects in the cache which have size
at least S, we remove the least recently used such
object from the cache.
• If there are no objects with size at least S, then we
start removing objects in LRU order of size at least
S/2, then objects of size at least S/4, and so on until
enough free cache space has been created.
9
Key-based Replacement
• SIZE policy:
– In this policy, the objects are removed in order
of size, with the largest object removed first.
– Ties based on size are somewhat rare, but when
they occur they are broken by considering the
time since last access. Specifically, objects
with higher time since last access are removed
first.
10
Key-based Replacement
• LRU-Threshold is the same as LRU, but
objects larger than a certain threshold size
are never cached.
• Hyper-G is a refinement of LFU, break ties
using the recency of last use and size.
• Lowest Latency First minimizes average
latency by evicting the document with the
lowest download latency first.
11
Cost-based Replacement
• Employ a potential cost function derived from
different factors such as
–
–
–
–
time since last access,
entry time of the object in the cache,
transfer time cost,
object expiration time and so on.
• GreedyDual-Size (GD-Size) associates a cost with
each object and evicts object with the lowest
cost/size.
• Hybrid associates a utility function with each
object and evicts the one has the least utility to
reduce the total latency.
12
Cost-based Replacement
• Lowest Relative Value evicts the object with
the lowest utility value.
• Least Normalized Cost Replacement (LCNR) employs a rational function of the access
frequency, the transfer time cost and the
size.
• Bolot/Hoschka employs a weighted rational
function of transfer time cost, the size, and
the time last access.
13
Cost-based Replacement
• Size-Adjusted LRU (SLRU) orders the object by
ratio of cost to size and choose objects with the
best cost-to-size ratio.
• Server-assisted scheme models the value of
caching an object in terms of its fetching cost, size,
next request time, and cache prices during the time
period between requests. It evicts the object of the
least value.
• Hierarchical GreedyDual (Hierarchical GD) does
object placement and replacement cooperatively in
a hierarchy.
14
GreedyDual
• GreedyDual is originally proposed by Young
and Tarjan, concerned with the case when
pages in a cache have the same size but incur
different costs to fetch from a secondary
storage
• A value H is initiated with each cached page p
when a page is brought into cache.
– H is set to be the cost of bringing p into the cache
– the cost is always nonnegative.
• (1) Page with the lowest H value (minH) is
replaced and (2) then all pages reduce their H
values by minH
15
GreedyDual
• If a page is accessed, its H value is restored
to the cost of bringing it into the cache
• Thus the H values of recently accessed
pages retain a larger portion of the original
cost than the pages that have not been
accessed for a long time
• By reducing the H values as time goes on
and restoring them upon access,
GreedyDual integrates the locality and cost
concerns in a seamless fashion
16
GreedyDual-Size
• Setting H to cost/size upon accesses to a
document, where cost is the cost of bringing
the document and size is the size of the
document in bytes
– call this extended version as GreedyDual-Size
• The definition of cost depends on the goal of
the replacement algorithm cost is set to
– 1 if the goal is to maximize hit ratio
– the downloading latency if the goal is to
minimize average latency
– network cost if the goal is to minimize the total
cost
17
GreedyDual-Size
• Implementation:
– Need to decrement all the pages in cache
by Min(q) every time a page q is replaced,
which may be very inefficient
– Improved algorithm is in the next page
– Maintaining a priority queue based on H
– Handling a hit requires O(log k) time and
– Handling an eviction requires O(log k) time
since in both cases the queue needs
update
18
GreedyDual-Size
Algorithm GreedyDual (document p)
/* Initialize L  0 */
(1) If p is already in memory,
(2)
H(p)  L + cost(p)/size(p)
(3) If p is not in memory,
(4) while there is not enough room in memory for p,
(5)
Let L  min H(q) for all q in cache
(6)
Evict q such that H(q) = L
(7) Put p into memory & set H(p)L+cost(p)/size(p)
19
Hybrid Algorithm (HYB)
• Motivated by Bolot and Hoschka's algorithm.
• HYB is a hybrid of several factors,
considering not only download time but also
number of references to a document and
document size. HYB selects for replacement
the document i with the lowest value of the
following expression:
20
HYB
• Utility function is defined as follows
Wb
Wn
(Cs + bs ) (np)
Zp
–
–
–
–
–
Cs is the estimated time to connect to the server
bs is the estimated bandwidth to the server
Zp is the size of the document
np is the # of times document has been referenced
Wb and Wn are constants that set the relative
importance of the variables bsand np, respectively
21
Latency Estimation Algo. (LAT) [REF]
• Motivated by estimating the time required to
download a document, and then replace the document
with the smallest download time.
• Apply some function to combine (e.g., smooth) these
time samples to form an estimate of how long it will
take to download the document
– keeping a per-document estimate is probably not practical.
– Alternative: keep statistics of past downloads on a perserver basis, rather than a per-document basis. (less storage)
• For each server j, the proxy maintains an
– Clatj: estimated latency (time) to open connection to server
– Cbwj: estimated bandwidth of the connection (in
bytes/second),
22
Latency Estimation Algo. (LAT) [REF]
– When a new document is received from server, the
connection establishment latency (sclat) and bandwidth for
that document (scbw) are measured , the estimates are
updated as follows:
clatj = (1-ALPHA) clatj + ALPHA sclat
cbwj = (1-ALPHA) cbwj + ALPHA scbw
– ALPHA is a smoothing constant, set to 1/8 as it is in the
TCP smoothed estimation of RTT
– Let ser(i) denote the server on which document i resides,
and si denote the document size. Cache replacement
Replacement algorithm LAT selects for replacement the document i
Algorithm
with the smallest download time estimate, denoted di:
– di = clatser(i) + si/cbwser(i)
23
Latency Estimation Algo. (LAT)
• One detail remains:
– a proxy runs at the application layer of a network protocol
stack, and therefore would not be able to obtain the
connection latency samples sclat.
– Therefore the following heuristic is used to estimate
connection latency. A constant CONN is chosen (e.g.,
2Kbytes). Every document that the proxy receives whose
size is less than CONN is used as an estimate of
connection latency sclat.
– Every document whose size exceeds CONN is used as a
bandwidth sample as follows:
scbw = download time of document – current value of clatj.
24
Lowest Relative Value (LRV)
• time from the last access t : for its large
influence on the probability of a new access
– the probability of a new access conditioned
to the time from the last access can be
expressed as (1 - D(t))
• # of previous accesses i: this parameter
allows the proxy to select a relatively small
number of documents with a much higher
probability of being accessed again
• document size s: This seems to be the most
effective parameter that make a selection
among documents with only one access
25
Distribution of interaccess times, D(t)
26
Prob. Density function of interaccess times, d(t)
27
Lowest Relative Value (LRV)
• We compute the probability that a document
is accessed again, Pr(i, t, s), as follows
Pr(i, t, s) = P1(s)(1 - D(t)) if i = 1
Pr(i, t, s) = Pi (1 – D(t)) otherwise
– Pi: conditional probability that a document is
reference i+1 times given that it has been
accessed i times
– P1(s): Percentage of size s with at least 2
accesses
– D(t): density distribution of times between
consecutive requests to the same document,
derived as D(t) = 0.035log(t+1) + 0.45(1 - e
28
t
2E6 )
Lowest Relative Value (LRV)
29
Lowest Relative Value (LRV)
30
Performance from Pei Cao
• Use hit ratio, byte hit ratio, reduced latency and
reduced hops
– reduced latency = the sum of downloading latency
for the pages that hit in cache as a percentage of
the sum of all downloading latencies
– reduced hops = the sum of the network costs for the
pages that hit in cache as a percentage of the sum
of the network costs of all Web pages
• model network cost of each document as hops
– Web server has hop value: 1 or 32; we assign 1/8 of
servers with hop value 32 and 7/8 with hop value 1
– The hop value can be thought of either as the
number of network hops traveled by a document or
as the monetary cost associated with the document
31
Performance from Pei Cao
• GD-Size(1) sets cost of each document to be
1, thus trying to maximize hit ratio
• GD-Size(packets) sets the cost for each
document to 2+size/536, i.e. estimated
number of network packets sent and received
if a miss to the document happens
– 1 packet for the request, 1 packet for the reply and
size/536 for extra data packets assuming a 536byte TCP segment size.
– It tries to maximize both hit ratio and byte hit ratio
• Finally GD-Size(hops) sets the cost for each
document to the hop value of the document
trying to minimize network costs
32
Performance from Pei Cao
• See Cao’s paper: page 4
33
Weighted Hit Rate
• Results on best primary key are
inconclusive
• Most references are from small files,
but most bytes are from large files
• Why Size?
– Most accesses are for smaller
documents
– A few large documents take the space
of many small documents
– Concentration of large inter-reference
times
34
Exp. 3: Partitioning Cache by Media
• Idea
– Do clients that listen to music degrade the
performance of clients using text and graphics?
– Could a partitioned cache with one portion
dedicated to audio, and the other to non-audio
documents increase the WHR experienced by
either audio or non-audio documents?
• Simulate
– cache size = 10% of max needed
– two partitions: audio and non-audio
35
Exp. 4: Partitioning Cache by Media
• In Experiment 4,
– a one-level cache with SIZE as the primary key
– random as the secondary key
– three partition sizes: dedicate 1/4, 1/2, or 3/4 of
the cache to audio;
– the rest is dedicated to non-audio documents.
36
Exp. 4: Partitioning Cache by Media
37
Exp. 4: Partitioning Cache by Media
38
Problems to solve
• Certain sorting keys have intuitive appeal.
– The first is document type. A sorting key that puts
text documents at the front of the removal queue
would insure low latency for text in Web pages, at
the expense of latency for other document type.
– The second sorting key is refetch latency. To a
user of international documents, the most obvious
caching criteria is one that caches documents to
minimize overall latency.
• A European user of North American documents
would preferentially cache those documents over
ones from other European servers to avoid using
heavily utilized transatlantic network links. Therefore
a means of estimating the latency for refetching
documents in a cache could be used as a primary
sorting key.
39
Problems to solve
• caching dynamic documents. Cache is
only useless for dynamic documents if
the document content completely
changes; otherwise a portion but not all
of the cached copy remains valid.
– allow caches to request the differences
between the cached version and the latest
version of a document.
40
Problems to solve
• For example, in response to a conditional GET
a server could send the “diff" of the current
version and the version matching the LastModified date sent by the client; or a specific
tag could allow a server to “fill-in“ a previously
cached static “query response form."
– Another approach to changing semi-static
pages (i.e., pages that are HTML but
replaced often) is to allow Web servers to
preemptively update inconsistent
document copies, at least for the most
popular.
41
Randomized Strategies
• These strategies use randomized decisions
to find an object for replacement.
42
Randomized Strategies
Randomized Strategies
• 1. RAND
– This strategy removes a random object.
• 2. HARMONIC [Hosseini-Khayat 1997]
– RAND uses equal probability for each object,
HARMONIC removes from cache one item at
random with a probability inversely
proportional to its specific cost cost = ci/si .
43
Randomized Strategies
Randomized Strategies
• 3. LRU-C and LRU-S [Starobinski and Tse
2001].
– LRU-C is a randomized version of LRU.
Let Cmax={c1,…cN} be the maximum of the
access costs of all N objects of a request
sequence.
Let ĉi = ci/cmax be the normalized cost for
object i. When an object i isc~irequested, it is
moved to the head of the cache with
probability ĉi; otherwise, nothing is done.
44
Randomized Strategies
Randomized Strategies
• LRU-S uses the size instead of the cost. Let smin={s1,…sN} be
the size of the smallest objects among the N documents, and di
=smin/si be the normalized density of object i.
• LRU-S acts as LRU with probability di; otherwise the cache
state is left unmodified.
– Furthermore, Starobinski and Tse [2001] proposed an
algorithm which deals with both varying-size and varyingcost objects.
– The following quantities were defined:
ci
 i  ;  max  max { i }; ~i   i
i
si
 max
Upon a request for object i, this algorithm performs the same
operation as LRU with probability ~i and with 1 ~i will leave
the cache state unmodified.
45
Randomized Strategies
Randomized Strategies
• 4. Randomized replacement with general value
functions [Psounis and Prabhakar 2001].
– This strategy draws N objects randomly from the
cache and evicts the least useful object in the
sample. The usefulness of a document can be
determined by any utility function. After replacing
the least useful object, the next M(M < N) least
useful objects are retained in memory.
– At the next replacement, N − M new samples are
drawn from the cache and the least useful of these
N−M and M previously retained is evicted. The M
least useful of the remaining are stored in memory
and so on.
46
Randomized Strategies
Randomized Strategies Summary
1. Randomization
presents a different
approach to cache replacement.
2. Randomized strategies try to reduce the
complexity of the replacement process
without sacrificing the quality too much.
47
Admission control
• If we store the response in cache or not?
• First time not save
48
Admission control
• heuristic to make this decision: the most
frequently accessed objects recently will
most likely be accessed again. The words
“frequently” and “recently” imply that
access frequency of objects and a decay
function applied on frequency are needed.
• an extra space called URL memory cache is
introduced to store URLs and the associated
access frequency of the requested objects.
49
Admission control
• If the requested object is cacheable, the
process of storing the object in disk cache is
delayed until the same object is accessed
again. (Or we can say that cacheable objects
are not stored in disk cache unless they have
been accessed before. )
• Since the access stream is infinite, the size
of URL cache must be limited. A
replacement policy is also needed in URL
cache.
50
Admission control: operations
• Cache hits:
– The operations are similar to the original algorithm.
– In addition to unused non-cacheable objects and hot
objects in memory cache, the cacheable objects without
disk copies are also the candidates for replacement in
memory cache.
– Consider the case that a copy of the requested object exists
in memory cache but not in disk cache.
• The reference count associated with the requested object in
memory cache is incremented by one and the data is then
stored in disk cache.
• If the evicted objects from memory cache are cacheable, its
URL along with its reference count is then stored in URL
cache.
51
Admission control: operations
• Cache misses for cacheable objects :
– If the requested object is cacheable, the caching algorithm
checks
(1) if its URL is not stored in URL cache.
• Replacement operations are performed for allocating
enough space for holding the requested object.
• The URL of the replaced object is now stored in URL cache
along with its reference count.
• The replacement operations in URL cache must be
performed.
• The evicted URLs from URL cache are released.
• The requested object itself is not stored in disk cache at this
moment. Thus, no replacement in disk cache is needed.
52
Admission control: operations
• Cache misses for cacheable objects :
(2) if the URL of the requested object is stored in URL cache,
• its associated record in URL cache is removed, the
requested object is stored in disk cache, and the reference
count is set to one.
• Similarly, the replacement operations in disk cache must be
performed. The URLs of the evicted objects from disk cache
are stored in URL cache and again the replacement
operations in URL cache are performed.
53
Admission control: operations
• Cache misses for no-cacheable objects :
– For a cache miss, if the object is non-cacheable, the
operations are similar to original algorithm. If the evicted
object from memory cache is cacheable and it does not
exist in disk cache, its URL along with the reference
count is stored in URL cache.
– Notice that the proposed approach may lose some
possible hits on the disk cache when objects are accessed
the second time. However, it removes all the disk activity
that disk cache stores the objects that will not be accessed
again before evicted.
54
Admission control
• Efficient Management of URL Cache
– A separate hash table similar to that in memory/disk
cache is used in URL cache to support efficient search
for the URL of requested object.
– The MD5 of URL is employed as the search key.
– We employ a replacement policy that is based on the
URL access frequency.
– The least frequently accessed entry in URL cache is
first selected for replacement.
– A priority queue with access frequency as the key is a
suitable implementation for such replacement policy.
55
Admission control
• Efficient Management of URL Cache
– Each entry of the URL cache records the MD5 of
URL, access frequency, and a few pointers for
facilitating priority queue and hash table data
structures.
– The required memory space for each entry in URL
cache is constant.
– The size of hash table and priority queue itself is
small and does not depend on the number of entries
hashed, thus can be ignored.
– Based the size of the UC trace we studied in this paper,
keeping all the URLs of the requests from one-day
period in URL cache is reasonable. This accounts for
400k URLs. Therefore, assuming 80 bytes is needed
for each entry in URL cache, 32 MB of the memory
space is needed for the URL cache.
56
hit ratio h(S)
0.7
CHU
0.65
heff(S)
HR
h(S)
0.6
0.55
1
2
4
57
8
16
32
Removal frequency
• On-demand: Run policy when the size
of the requested document exceeds the
free room in a cache. (take time to do
the removal)
• Periodically: Run policy every T time
units, for some T.
– If removal is time consuming
• Both on-demand and periodically: Run
policy at the end of each day and ondemand (Pitkow/Recker [13]).
58
On-demand
• Two arguments suggest that overhead of simply
using on-demand replacement will not be
significant.
– First, the class of removal policies maintains a sorted
list. If the list is kept sorted as the proxy operates,
then the removal policy merely removes the head of
the list for removal, which should be a fast and
constant time operation.
– Second, a proxy server keeps read-only documents.
Thus there is no overhead for “writing-back" a
document, as there is in a virtual memory system
upon removal of a page that was modified since
being loaded.
59
How many to remove
• Removal process is stopped when the
free cache area equals or exceeds the
requested document size.
• Replace documents until a certain
threshold (Pitkow and Recker's comfort
level) is reached.
60