<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Scott&apos;s Blag Redux</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/" />
    <link rel="self" type="application/atom+xml" href="http://blag.dunedain289.com/atom.xml" />
    <id>tag:blag.dunedain289.com,2008-05-30://1</id>
    <updated>2008-07-30T07:53:08Z</updated>
    <subtitle>Now with 100% more speed and 100% less mod_rails!</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Open Source 4.1</generator>

<entry>
    <title>Patents are broken</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/07/patents-nintendo.html" />
    <id>tag:blag.dunedain289.com,2008://1.12</id>

    <published>2008-07-23T17:01:14Z</published>
    <updated>2008-07-30T07:53:08Z</updated>

    <summary>http://arstechnica.com/news.ars/post/20080723-nintendo-cant-fight-off-patent-metroids-faces-injunction.html Looks like another company is losing to a patent troll. I keep thinking that Congress or the courts need to put a stop to this practice, but it&#8217;s so hard to define. Where do you draw the line between...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="Intellectual Property" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Politics" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="intellectualproperty" label="intellectual property" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="politics" label="politics" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>http://arstechnica.com/news.ars/post/20080723-nintendo-cant-fight-off-patent-metroids-faces-injunction.html</p>

<p>Looks like another company is losing to a patent troll.  I keep thinking that Congress or the courts need to put a stop to this practice, but it&#8217;s so hard to define.  Where do you draw the line between patent troll behavior and a legitimate inventor protecting his creation before he can bring it to market?  No one seems to have an answer yet, but we need one soon.  Perhaps patent lawsuits should have a stricter requirement for &#8216;standing&#8217; - the patent owner must have a product that is damaged by the defendant&#8217;s actions?  It would eliminate patent trolling, but it would also limit the usefulness of patents in protecting new products before they release.  </p>

<p>Unfortunately, there is little emphasis on patent reform - it&#8217;s a complicated issue with a headline-generation-quotient of 0.0 for politicians.  Plenty of reason there for them to stay away.</p>
]]>
        

    </content>
</entry>

<entry>
    <title>Dark Knight</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/07/dark-knight.html" />
    <id>tag:blag.dunedain289.com,2008://1.11</id>

    <published>2008-07-20T06:36:34Z</published>
    <updated>2008-07-20T06:36:34Z</updated>

    <summary>I saw The Dark Knight today. Amazing. Absolutely amazing. Every cast member is perfect. Heath Ledger really does deliver an Oscar winning performance, and the Nolan brothers wrote the best movie so far this year. Go see it....</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="Movies" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>I saw <em>The Dark Knight</em> today. Amazing. Absolutely amazing. Every cast member is perfect. Heath Ledger really does deliver an Oscar winning performance, and the Nolan brothers wrote the best movie so far this year. Go see it. </p>
]]>
        

    </content>
</entry>

<entry>
    <title>Netflix SQLite finished</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/06/netflix-sqlite-finished.html" />
    <id>tag:blag.dunedain289.com,2008://1.10</id>

    <published>2008-06-04T05:07:07Z</published>
    <updated>2008-06-04T05:07:07Z</updated>

    <summary>So it&#8217;s finally done. I have a ~4GB SQLite database file that has all user-movie-rating triples, along with indices on users and movies. It took almost a week to get all that data in and indexed. Most of that time...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="NetflixPrize" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="netflixprize" label="netflixprize" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sqlite" label="sqlite" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>So it&#8217;s finally done. I have a ~4GB SQLite database file that has all user-movie-rating triples, along with indices on users and movies. It took almost a week to get all that data in and indexed. Most of that time was spent indexing. My next moves will be creating tables with per-movie and per-user average ratings. After that, I&#8217;m going to start working on SVD. </p>
]]>
        

    </content>
</entry>

<entry>
    <title>Security Theater</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/06/security-theater.html" />
    <id>tag:blag.dunedain289.com,2008://1.9</id>

    <published>2008-06-02T00:19:02Z</published>
    <updated>2008-06-02T21:56:28Z</updated>

    <summary>So, I&#8217;ve now flown twice in 4 days. Austin -&gt; Midland Thursday, and Midland -&gt; Dallas -&gt; Austin on Sunday. Each time, I had to empty my pockets, check my (smallish) duffel because I like my own toothpaste, shaving cream...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="Politics" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="politics" label="politics" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="securitytheater" label="security theater" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="travel" label="travel" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>So, I&#8217;ve now flown twice in 4 days.  Austin -> Midland Thursday, and Midland -> Dallas -> Austin on Sunday.  Each time, I had to empty my pockets, check my (smallish) duffel because I like my own toothpaste, shaving cream and shampoo, take off my shoes and remove my laptop from its bag.  The more I have to do this nonsense, the more I realize how ridiculous it is.  I&#8217;m all for safety and security on airlines, but I think we need to take a leaf out of the Israeli book: harden the cockpits, lock them, and train the pilots extensively that you never open the door, and you land as soon as anything goes wrong.  Make planes a terrible target for terrorists, and you don&#8217;t need to go crazy checking people before they get on board.  The liquids thing today is the most ridiculous part.  Liquid explosives are terrible, expensive, and hard to get.  The 9/11 hijackers didn&#8217;t use them.  The just used box cutters.  Blocking liquids from carry ons has also overloaded airlines/airports baggage handling.  If they&#8217;re going to block something, why not block laptops.  I bet it&#8217;d be easy to build a blade inside a laptop somewhere.  With enough machining (not a hard skill to learn), it&#8217;d probably be invisible on an X-ray.  Or a battery could be turned into a bomb pretty easily. Laptops with two batteries would even keep working.  Who really thinks airport security is going to know to check that both batteries register to the operating system, especially if it&#8217;s Linux?</p>

<p>A few years ago, I took Amtrak trains between Washington, D.C. and New York.  Never went through a metal detector.  Never sent my bags through an X-ray machine.  And, I never felt insecure, simply because trains are a terrible target.  You can&#8217;t do anything with them if you hijack them, and they&#8217;re always on the ground.  We need to either make planes the same type of terrible target, or we need to go back to trains.  That&#8217;s the only way to actually secure our travel. </p>
]]>
        

    </content>
</entry>

<entry>
    <title>SQLite is slow</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/sqlite-is-slow.html" />
    <id>tag:blag.dunedain289.com,2008://1.7</id>

    <published>2008-05-31T18:57:26Z</published>
    <updated>2008-05-31T19:02:37Z</updated>

    <summary>Since this is such breaking news at this point. SQLite is really really slow creating indices on big tables. It&#8217;s still trying to create the first index on users. I&#8217;m about to kill it, wipe the database out, then set...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="NetflixPrize" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="netflixprize" label="netflixprize" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sqlite" label="sqlite" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>Since this is such breaking news at this point.  SQLite is really really slow creating indices on big tables.  It&#8217;s still trying to create the first index on users.  I&#8217;m about to kill it, wipe the database out, then set the indices up to be created while inputing the data.  Maybe that will go faster.  </p>
]]>
        

    </content>
</entry>

<entry>
    <title>I just finished installing Movable Type 4!</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/i-just-finished-installing-mov.html" />
    <id>tag:dunedain289.com,2008:/scotts_blag//1.1</id>

    <published>2008-05-30T20:58:35Z</published>
    <updated>2008-05-31T19:06:06Z</updated>

    <summary>Because while I love Ruby on Rails, it&#8217;s just too damn slow to run my blog on my Slicehost 256 Mb VPS, at least with mod_rails, and I&#8217;m too lazy to figure out Mongrel or any of the other Rails...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="Random" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>Because while I love Ruby on Rails, it&#8217;s just too damn slow to run my blog on my <a href="http://www.slicehost.com">Slicehost</a> 256 Mb VPS, at least with mod_rails, and I&#8217;m too lazy to figure out Mongrel or any of the other Rails systems.  It also seems less than stable, especially with how much I&#8217;m messing with it.  Plus, Movable Type has an <a href="http://plugins.movabletype.org/imt">iPhone interface plugin</a>.  </p>
]]>
        

    </content>
</entry>

<entry>
    <title>NSTX</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/nstx.html" />
    <id>tag:dunedain289.com,2008:/scotts_blag//1.6</id>

    <published>2008-05-30T06:14:02Z</published>
    <updated>2008-05-31T07:22:51Z</updated>

    <summary> So, NSTX is awesome. I started playing with this just before I left to come home, but my DNS changes didn&apos;t have time to propagate until tonight. It&apos;s actually quite easy to set up, and surprisingly fast - at...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="Hacking" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="netflixprize" label="netflixprize" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="nstx" label="nstx" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sqlite" label="sqlite" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<div class="content">
    <p>So, <a href="http://blag.dunedain289.com/thomer.com/howtos/nstx.html">NSTX</a>
is awesome. I started playing with this just before I left to come
home, but my DNS changes didn't have time to propagate until tonight.
It's actually quite easy to set up, and surprisingly fast - at least,
from where I'm currently testing it (my home in Midland). Definitely
usable. There's also an IP-over-ICMP program, I may look into it next.
That system seems to have some advantages over the DNS system, and
could be an interesting project to hack on. It looks easier to set up,
too - no DNS changes that require propagation.</p>

<p>Netflix update: still creating the users index. I think creating the
index while inserting data may have been a better idea after all. Top
reports 34:54.75 runtime so far. I'll let it keep running and check it
again tomorrow.</p>

      </div> ]]>
        
    </content>
</entry>

<entry>
    <title>Netflix Update</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/netflix-update.html" />
    <id>tag:dunedain289.com,2008:/scotts_blag//1.5</id>

    <published>2008-05-28T06:13:12Z</published>
    <updated>2008-05-31T06:13:43Z</updated>

    <summary> Importing the ratings took about 2 hours. That was with a little tuning, too. Now I&apos;m trying to generate those indices. It&apos;s going to take just as long. Ugh. Too much data....</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="NetflixPrize" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="netflixprize" label="netflixprize" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sqlite" label="sqlite" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<div class="content">
    <p>Importing the ratings took about 2 hours.
That was with a little tuning, too. Now I'm trying to generate those
indices. It's going to take just as long. Ugh. Too much data. </p>

      </div> ]]>
        
    </content>
</entry>

<entry>
    <title>Netflix Prize Again</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/netflix-prize-again.html" />
    <id>tag:dunedain289.com,2008:/scotts_blag//1.4</id>

    <published>2008-05-28T06:11:00Z</published>
    <updated>2008-05-31T06:12:47Z</updated>

    <summary> So, I took another look at the training data for this contest. Absolutely freaking enormous. 100 Million ratings from 17000 movies and nearly 500K users. Unfortunately, the user ids run from 1 to 2.6M and have lots of gaps....</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="NetflixPrize" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="netflixprize" label="netflixprize" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="sqlite" label="sqlite" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<div class="content">
    <p>So, I took another look at the training
data for this contest. Absolutely freaking enormous. 100 Million
ratings from 17000 movies and nearly 500K users. Unfortunately, the
user ids run from 1 to 2.6M and have lots of gaps. So, importing the
ratings as a matrix in a C app is not an option - the matrix would be
over 8.5 GB, even using just chars. So, I thought SQL might be good -
the biggest requirements here are being able to index in like a matrix.
SQL allows that well enough, especially with indexes on both of those
columns. I started with SQLite3, and it seemed slow. So now I'm trying
MySQL. It's slower. A <strong>lot</strong> slower. Either I suck at
configuring it (possible, but it shouldn't be this bad out of the box)
or it's just slower than Christmas for lots of INSERTs. I may go back
to SQLite. Either way, I need an easy read-only DB for this part. <a href="http://en.wikipedia.org/wiki/Cdb_%28software%29">CDB</a>
seemed like a good option - they're really quick, read-only, and even
fast to create. The downside is that is a hash-table only. Since I need
a matrix-style system, that seems like a bad plan. I could just use
"userid.movieid" as the key, but then I can't read in all the ratings
from a user or ratings on a movie. Even duplicating the data in
different views doesn't help. This needs to be either SQL or a matrix -
there are things I'll need to do that only work with those access
methods. The rest of this program can be done in C/C++ without
databases, but this really needs to be a DB. Definitely going back to
SQLite though. And then I'm going to let this run overnight. Because
it's going to take <strong>that freaking long</strong> to import this data.  </p>

      </div> ]]>
        
    </content>
</entry>

<entry>
    <title>Netflix Prize</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/netflix-prize.html" />
    <id>tag:dunedain289.com,2008:/scotts_blag//1.3</id>

    <published>2008-05-27T06:09:30Z</published>
    <updated>2008-05-31T06:10:17Z</updated>

    <summary> So, the Netflix Prize has me intrigued. Really, really intrigued. I&apos;m not good enough at machine learning, stats, or whatever to win, but I&apos;m going to be playing with it in the future. Code to follow. I&apos;ll be implementing...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="NetflixPrize" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="netflixprize" label="netflixprize" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<div class="content">
    <p>So, the <a href="http://netflixprize.com/">Netflix Prize</a>
has me intrigued. Really, really intrigued. I'm not good enough at
machine learning, stats, or whatever to win, but I'm going to be
playing with it in the future. Code to follow. I'll be implementing
something like <a href="http://sifter.org/%7Esimon/journal/20061211.html">this</a> at first.</p>

      </div> ]]>
        
    </content>
</entry>

<entry>
    <title>Welcome</title>
    <link rel="alternate" type="text/html" href="http://blag.dunedain289.com/2008/05/welcome.html" />
    <id>tag:dunedain289.com,2008:/scotts_blag//1.2</id>

    <published>2008-05-26T06:06:43Z</published>
    <updated>2008-05-31T06:08:53Z</updated>

    <summary>So, I finally created a blog. We&#8217;ll see how this goes&#8230;I&#8217;ll post something a little more interesting later today. Just wanted to say Welcome!...</summary>
    <author>
        <name>Scott</name>
        
    </author>
    
        <category term="Random" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://blag.dunedain289.com/">
        <![CDATA[<p>So, I finally created a blog.  We&#8217;ll see how this goes&#8230;I&#8217;ll post something a little more interesting later today.  Just wanted to say Welcome!</p>
]]>
        

    </content>
</entry>

</feed>
