Tentacular magic for SL (Squidifying your viewer)

You might have seen Tateru Nino’s article Proxying Second Life HTTP Textures with Squid …  And then gotten frustrated for any of a number of reasons.  This mini-article expands on Tateru’s excellent post, adding two big hints: first use the right version of Squid and second how to get it working without fancy firewall rules.  The bottom line is that using these two extra tricks you can get a (much!) larger than 1G texture cache working with existing SL client software and even get it all running on a single machine!

Obviously, this procedure will only really be interesting if you are running a Second Life viewer using HTTP textures – I’ve tested using both Linux and Windows versions of a number of official Viewer 2s as well as Phoenix (and Firestorm).

So hop in your deep sea submarine and follow me below the waves in search of the elusive… Second Life Squid! …

NOTE: There is a followup on the way HERE

Latest isn’t always greatest!

First off, don’t use Squid 3!  I know, I know – why would you want to use a crufty old squid when there is a nice bright and shiny new squid?  Well, here’s the thing: first, texture requests are always addressed to the sim where the texture is seen.  This is generally a fine thing, but the straightforward approach of caching would mean that the same texture seen on two different sims would be cached twice.  For common textures this is a big waste of space and bandwidth.

But wait!  it is even worse!  Each request is identified by a “cap” UUID that authenticates the requester to the sim – also a fine thing, but it means that if two users see the same texture on the same object, on the same sim, the retrieve request will be different and, so, will be cached separately.  Tateru’s article discusses these problems at the end in the “caching caveats” section and Robin Cornelius mentions the StoreURLRewrite fix in the comments (see below for my solution).

However, it turns out that when Squid was rewritten for version 3, the developers didn’t port the very feature we need.  Instead, you want to back yourself up to Squid 2.7, which is likely still to be available for whatever OS you’d like.

What if I don’t have a Linux firewall?

Yes, you should always have a firewall protecting your windows machine from the jungle of the Intertubes (and if you don’t please don’t send me email or MS Office documents!)  Most people just use commodity routers with integrated firewalls at home.  If you had an efficient texture cache, you might also want to consider running such a cache on the same machine as your SL client: yes, it would compete with your client for CPU time, but you also could have as large a cache as you’d like and save your bandwidth for other purposes.  Also, such a scheme would allow an SL laptop to get the benefits of a big cache without losing it when it gets moved from network to network.

Obviously you could not use the approach that Tateru suggests – SOCKS5 and explicit setting of HTTP proxy inside your viewer would enable this use but most http-texture enabled viewers have broken settings, only using the proxy setting with the internal browser to look at web pages (and media streams?).  But it turns out that there is a simpler way available and you can use it now.

The viewers use a standard opensource library libCurl as its HTTP client.  LibCurl has a neat feature that, by default, it looks at various environment variables to configure proxies.  While this feature can be disabled, it is perfectly functional in the SL viewers that I’ve tried.  This allows you to tell all http use in the viewer to go through whatever proxy you’d like, either on lan machine or localhost, simply by setting the right environment variable.

A 20G local cache for your SL

So here’s my recipe.  There are lots of variations possible here, and I’m sure there is a lot of room for improvement, but the bottom line is that this works, today.  The scenario that I’m describing is setting up a 20G squid texture cache for use only by viewers running on the same machine.  I’ll focus on doing this on Linux (since that’s what I’ve set up), but other OSes are actually pretty similar – I’ll include  notes on others along the way.

  1. Install Squid 2.7 (NOT 3.anything!).  Don’t do anything fancy here, just get it installed as is.
  2. Make the changes to the configuration file /etc/squid/squid.conf.  I like interspersing the changes into the “right” section wherever possible, In my own file, I’ve marked my additions with “(vex)”. Note that anything after a “#” is a comment and is ignored by squid. Also, order matters! In particular, if the first rule to apply to a URL says “don’t allow” then it will not be allowed, even if a later rule would negate that restriction.
    1. add (or change) the cache_dir line to specify a biggish cache. I promised 20G, so make sure you’ve got that much spare space and then do this:
      # comment out the default (if it isn't already)- 100M just isn't enough! (vex)
      #cache_dir ufs /var/spool/squid 100 16 256
      # big cache for textures (vex)
      cache_dir aufs /var/spool/squid 20000 16 256

      I’m specifying a cache of 20000 megabytes there, distributed inside 16 top level directories, each with 256 subdirectories inside to hold the actual cache files. Play around if you’d like, but don’t go nuts right away. I’m also using threaded (aufs) cachefile access rather than not, as I like threads

    2. Add the http textures port to the list of safe http ports to use – probably not required because it is already covered in most default configutaion files, but worth making explicit. Add to the acl section:
      acl Safe_ports port 12046   # Second Life HTTP textures (vex)
    3. If you want to be able to use the cache from any machine on your LAN, you’ll want to be sure that the following lines aren’t commented:
      acl localnet src 192.168.0.0/16  # my standard internal network (vex)
      http_access allow localnet #allow access by anyone on my lan (vex)
    4. Set up a transparent proxy service on port 3178 – this is the port we’ll tell the viewer to use:
      # (vex)
      http_port 3178 transparent
    5. Tell squid how to handle textures! Position is important here – I added this line just before the first refresh_pattern line in the file.
      # this needs to come before the ? and .  SL Texture cache (vex)
      refresh_pattern agni.lindenlab.com	10080 90% 44640 ignore-reload ignore-private ignore-no-cache

      Here we say “any resources that point to agni.lindenlab.com (i.e. main grid textures) can be cached for at least a week (10080 minutes) and up to a month or 90% of their age, regardless of what the server says we should do” [Edit: removed ignore-no-store directive, left over from dancing with Squid 3.x].

    6. Viewers have a mechanism to get partial textures by requesting just the first N bytes. Thats great for the viewer, but we only want to cache the whole texture so we need to get the whole thing every time by setting range_offset_limit to -1:
      # (vex)
      range_offset_limit -1
    7. Finally, we need to hook in the rewriter that will transform the requested URLs into the form that is actually cached. You only get one store rewriter, but it is a real (external) program that takes lines of URLs as inputs and spits out transformed lines on output (this can go at the very end):
      # rewrite rules for SL textures! (vex)
      acl store_rewrite_list dstdomain .agni.lindenlab.com
      storeurl_access allow store_rewrite_list
      storeurl_access deny all
      storeurl_rewrite_program /etc/squid/url_rewrite_textures.pl

      This says, invoke the store rewriter on urls that include .agni.lindenlab.com by calling the script in /etc/squid/url_rewrite_textures.pl.

  3. The “store rewriter” is the special sauce that makes it all worthwhile. As Tateru points out, the URLs that the viewer requests don’t lend themselves very well to caching. But with the above recipe (and Squid 2.7!), we can tell squid to rewrite those URLs just for storage purpose! This is an important point – we are not changing the URLs used to request or retrieve anything – what we are doing is telling under which name should content be cached. You can do any transformation you’d like here.What I do below is tell it to store any texture resource under the name “http://texture.lindenlab.com.INTERNAL/(texturekey)” I created this as a perl script and put it in /etc/squid with the squid.conf file. Location and name isn’t critical, but it needs to be executable and match the name given at the end of the squid.conf modifications above. Without further ado, here’s my script for /etc/squid/url_rewrite_textures.pl :
    #!/usr/bin/perl
    $| = 1;
    while (<>) {
            chomp;
    	if (m/http:\/\/.*lindenlab.com:12046\/cap.*\?texture_id=(.*)/) {
    	    print "http://texture.lindenlab.com.INTERNAL/".$1."\n";
            } else {
                    print $_ . "\n";
            }
    }

    As an example, if your viewer requests texture XXX from sim YYY, it might send an HTTP texture request like http://sim4035.agni.lindenlab.com:12046/cap/fab41b56-1173-280f-c39c-2e902091636c/?texture_id=3213b2c9-1540-bdf4-c90b-b782257418e9 . The cache sees that it is an “agni.lindenlab.com” address and invokes the store rewriter, which transforms it into http://texture.lindenlab.com.INTERNAL/3213b2c9-1540-bdf4-c90b-b782257418e9. Then it looks in its set of cached documents to see if it has THAT URL already stored. If so, it sends it directly back to the viewer. If not, it passes the original request through to Linden’s servers and waits for a response. When the response comes back, it passes those bits back to the viewer and saves those bits in the cache under the shortened name.

    The cool thing here is that if someone else on my LAN using my cache happens to see that same texture on a different sim, their viewer might request http://sim1370.agni.lindenlab.com:12046/cap/da95a624-231a-11e0-bbae-4fc0493c85e5/?texture_id=3213b2c9-1540-bdf4-c90b-b782257418e9 (note the different hostname and different UUID after /cap/!), the cache will shorten it to the same URL and get a hit, even though a very different URL was passed in, letting the cache respond without using your precious WAN bandwidth, without waiting for sim1370 to get the texture from Linden’s asset servers, and leaving the sim’s computer more computrons to do interesting stuff like physics and script running.

  4. initialize the cache:
    $ sudo squid -z

    All this does is set up your cache directory, mainly by creating all the appropriate directories and starter cache files.

  5. set the environment variable.  For purposes of discussion, I’m going to assume you are using bash or similarly bourne-shellish syntax:
    $ export http_proxy=http://127.0.0.1:3178

    This, of course, is the incantation that tells libCurl to use 127.0.0.1 (also known as localhost) port 3178 as an http proxy.  ANY application that uses libCurl that executes with that setting will use it.  This might be a problem if Squid used libCurl, but we’re lucky there.  Of course, if you happen to have set up your cache on another machine on your lan instead of your client, you can put your cache machine’s IP address in there.  For instance, at home, I have a setup where my Windows 7 SL machine points at a squid cache that I’ve set up on a little linux box for this pupose:

    Computer properties ->
    Advanced System Settings ->
    (advanced tab) ->
    Environment Variables ->
    User Variables (new) ->
    Variable=http_proxy, Value=http://192.168.1.99:3178
  6. test using the curl command:
    $ curl http://cnn.com

    did it print the html of the cnn front page? The other thing to do is check the squid logs:

    $ sudo tail /var/log/squid/access.log

    Note the last entry – it should mention cnn, indicating that the request actually went went through our cache.

  7. test using SL!  You are probably fine running like this, but you may always set the http_proxy value you in a shell script that runs your viewer (or a bat script if you are on windows).  And of course when the various viewers introduce a working http proxy setting, you can stop using the environment variable and point the viewer at the squid you’ve just set up.
    I like to do a “tail -f /var/log/squid/access.log” to watch the textures getting cached – any line with MISS is a texture getting grabbed from LL… any line with HIT is one that the cache found that didn’t have to go back to the Linden servers. Yay!
  8. Send me comments!

PS.  Techwolf Lupindo wrote up an overlapping approach here: http://wiki.phoenixviewer.com/doku.php?id=squid_proxy_cache

17 Replies to “Tentacular magic for SL (Squidifying your viewer)”

  1. Thanks for the extra info. I did as you suggested on my Linux server (Ubuntu 10.10) with Squid 2.7, and when I tried to init the cache, got this:

    root@linux1:/etc/squid# nano squid.conf
    root@linux1:/etc/squid# squid -z
    2011/01/17 19:07:26| parse_refreshpattern: Unknown option ‘agni.lindenlab.com’: ignore-no-store
    2011/01/17 19:07:26| Creating Swap Directories
    root@linux1:/etc/squid# service squid start
    squid start/running, process 28053

    The server started ok, just threw the error on refresh_pattern. Seems to be working okay regardless though I think it’s an error that needs fixing.

    1. The particular options vary considerably depending on version – which version are you actually running?
      [edit]
      Strange – it looks like that was left over from my trying to beat Squid 3 into submission. Indeed, ignore-no-store is a new 3.0 feature.

      1. Ah, found it, typo in that line in squid.conf. Works perfectly now. Sorry for the bother. I’m going to adapt this to run on SquidMan on the Mac as well.

        1. With the ignore-no-store? Doesn’t seem to be critical in any case – I was just carrying forward Tateru’s recipe as much as possible. Next on my queue is to put together an installer-based package for windows, pointing to localhost.

  2. For those using SquidMan on the Macintosh, here is the altered template file. Make these further changes to it before using:

    — replace USERNAME (last line) with your own account name
    — change filename of url_rewrite_textures.pl to squid_url_rewrite_textures.pl
    — put squid_url_rewrite_textures.pl into the folder referenced in the last line

    # ———————————————————————-
    # WARNING – do not edit this template unless you know what you are doing
    # ———————————————————————-

    cache_peer %PARENTPROXY% parent %PARENTPORT% 7 no-query no-digest no-netdb-exchange default

    # changed cache_dir for SL
    #cache_dir ufs %CACHEDIR% %CACHESIZE% 16 256
    cache_dir aufs %CACHEDIR% 20000 16 256

    maximum_object_size %MAXOBJECTSIZE%
    http_port %PORT% transparent
    visible_hostname %VISIBLEHOSTNAME%

    cache_access_log %ACCESSLOG%
    cache_log %CACHELOG%
    cache_store_log %STORELOG%
    pid_filename %PIDFILE%

    hierarchy_stoplist cgi-bin ?
    acl QUERY urlpath_regex cgi-bin
    no_cache deny QUERY

    # access control lists
    %ALLOWEDHOSTS%
    %DIRECTHOSTS%
    acl manager proto cache_object
    acl localhost src 127.0.0.1/255.255.255.255
    acl SSL_ports port 443 563 8443

    # added 12046 to Safe_ports for SL
    acl Safe_ports port 80 81 21 443 563 70 210 1025-65535 280 488 591 777 12046
    acl CONNECT method CONNECT

    # only allow cachemgr access from localhost
    acl localnet src 192.168.1.0/24

    # added localnet for SL
    http_access allow manager localhost localnet

    http_access deny manager

    # deny requests to unknown ports
    http_access deny !Safe_ports

    # deny CONNECT to other than SSL ports
    http_access deny CONNECT !SSL_ports

    # client access

    # added localnet for SL
    http_access allow localhost localnet
    %HTTPACCESSALLOWED%
    http_access deny all

    # direct access (bypassing parent proxy)
    %ALWAYSDIRECT%
    always_direct deny all

    # other adds for sl
    refresh_pattern agni.lindenlab.com 10080 90% 44640
    range_offset_limit -1
    acl store_rewrite_list dstdomain .agni.lindenlab.com
    storeurl_access allow store_rewrite_list
    storeurl_access deny all

    # change USERNAME to your own account
    storeurl_rewrite_program /Users/USERNAME/Library/Preferences/squid_url_rewrite_textures.pl

  3. Comments on Windows setup, using the official binaries from http://squid.acmeconsulting.it/

    When using Squid on the same machine as the viewer, you need to add

    “http_access allow localhost”

    (right after or in place of the localnet line specified by Vex).

    You need Perl to use Vex’s rewrite script. You can get a Windows package at http://strawberryperl.com/ if you don’t have one.

    Squid on Windows can’t execute scripts directly – you have to pass the path to the interpreter. For a default Strawberry Perl installation, this looks like:

    storeurl_rewrite_program c:/strawberry/perl/bin/perl.exe c:/squid/etc/url_rewrite_textures.pl

    Depending on your version of Windows (I use Vista), setting the http_proxy variable in Control Panel may not take effect until a reboot of the machine.

  4. Well I almost have this working
    slight issue

    2011/01/28 08:45:07| helperHandleRead: unexpected reply on channel -1 from store_rewriter #254 ‘http://texture.lindenlab.com.INTERNAL/3d5d1413-7258-3295-dc03-5abcb93e5239 10.254.253.5/carrot – GET – myip=10.254.253.1 myport=3178’

    Is this a case of TMI from squid to the perl script ?
    and

    perl 2011/01/28 08:50:10| WARNING: All store_rewriter processes are busy.
    2011/01/28 08:50:10| WARNING: up to 351 pending requests queued
    2011/01/28 08:50:10| Consider increasing the number of store_rewriter processes to at least 605 in your config file.
    WARNING! Your cache is running out of filedescriptors

    slackware 13.1
    perl 5.10.1
    emerald 1.5.2.818 (win)
    Squid 2.7 stable 9

    1. Well if the processes are busy, it suggests that you aren’t getting one line output per line if input. Squid is supposed to be ignoring extra stuff on the line, but you could always trim off the extra – for URL rewriting, some of those extra params are useful, but for our store rewriter, you don’t really need anything else: you could certainly trim off the extra by putting something like “([^ ]*)\s” in there instead of (.*) to match Amy string of non-space characters. I’d recommend that you try sending your rewriter a variety of likely texture URLs and make sure it is sensing back something sane.

      1. I fed it /etc/passwd , and got it back unchanged
        which I took to mean that it likes plain text

        cat /etc/passwd | /usr/squid/url_rewrite_textures.pl > /tmp/passwd ; diff /etc/passwd /tmp/passwd

        the files are the same…..

        # echo “http://sim4035.agni.lindenlab.com:12046/cap/fab41b56-1173-280f-c39c-2e902091636c/?texture_id=3213b2c9-1540-bdf4-c90b-b782257418e9” | ../url_rewrite_textures.pl
        http://texture.lindenlab.com.INTERNAL/3213b2c9-1540-bdf4-c90b-b782257418e9

        changing to “([^ ]*)\s” , breaks the rewriter

        if I comment out “storeurl_access allow store_rewrite_list”
        squid works and logs all trafic …..

        How can I debug this ?

        perl knowledge < 0

        Can I add a line or perl to append the input to a text file ?. ( well I can't can you tell me how)

        TIA

        Beyond

          1. Hmm – still seems odd that you were having these problems. FWIW, I’ve only used the default storeurl_rewrite_concurrency=0.

            I think the pattern “(\S+)” would have worked also, but it sounds like you had some different settings that the default (at least as provided by ubuntu and windoze)

          2. Ah. Yes.
            # “concurrency” concurrency
            # The number of concurrent requests the helper can process.
            # The default of 0 is used for helpers who only supports
            # one request at a time. Setting this changes the protocol used to
            # include a channel number first on the request/response line, allowing
            # multiple requests to be sent to the same helper in parallell without
            # wating for the response.
            # Must not be set unless it’s known the helper supports this.
            If you set it to something other than 0, the line passed to the rewriter will apparently be _prefixed_ by the helper number – seems like a really bad design decision, but what do I know. Anyway, yes – that seems to be the problem.

Comments are closed.