Eclipse + Leopard = Crash?

I recently upgraded to Leopard on my work MacBook Pro. In Eclipse, one of the first things I ran into was anytime I tried to do Open Resource (Shift + APPLE + R) to open a file resource, Eclipse would crash with a nasty exception.

Exception Type: EXC_BAD_ACCESS (SIGBUS)

After doing a bit of googling, this seems to be a bug in SWT for Leopard (see here). I upgraded my Eclipse to the 3.4 Stream Stable Build found at http://download.eclipse.org/eclipse/downloads/ and my problems have seem to gone away.

It’s got a pretty cool new splash screen too 🙂


eclipse-3.4 splash screen

Eclipse + Leopard = Crash?

Social Matchbox DC

Last week, I attended Social Matchbox DC with Brent and Brendan.

We all thought it would be a showcase of startups around the DC talking about the cool things they are doing. To our surprise it was more of a job fair than a social gathering. However, they had free pizza so I can’t complain too much. Still, it’s refreshing to see there is a startup community in the DC area, where it seems like everyone and their uncle work for the government. Clearspring and Freewebs were both there. I tried out making a widget on Clearspring’s platform and making a web site using Freewebs a while back. Both are very cool companies.

Although the DC area is more of a Government Valley, I wish there were more venues that allow startups and hackers get together around the area.

Social Matchbox DC

Open Source and Caching Algorithms

I wanted to go through the exercise of contributing to open source with a project of my own. After thinking about it for probably 15 minutes, I decided I wanted to try to build my own caching system in Java. Too bad I knew next to nothing about caching. I went off and did some research.

There are certain known algorithms that have become popular when implementing caches. Given that caches have a finite size (either you run out of space or memory), the cache algorithms are used to manage the cache. These algorithms determine things like how long an item remains in the cache and what gets booted out of the cache when it reaches its maximum size. Wikipedia describes the most efficient caching algorithm “would be to always discard the information that will not be needed for the longest time in the future”. You need to take a look at the data you want to cache before deciding on a caching strategy. Do you need to support random access (the access to the data is uniformly distributed) or sequential access ( you’re interested in large chunks of data at a time)? Is certain data accessed more often that other pieces of data?

Here’s a couple common algorithms:

  • Least Recently Used (LRU) – the items that haven’t been accessed the longest get the boot first. This is implemented by keeping a timestamp for all items in the cache. Check out this simple LRU implementation.
  • Least Frequently Used (LFU) – the items that are sitting in the cache but have been accessed the least are booted out first. This is implemented by a counter to see how often an item is accessed.
  • First In First Out (FIFO) – the item that first entered the cache is the first to go when it gets full. This can be easily implemented by a queue.

Of course, there are projects like EHCache and OSCache out there that have addressed this issue.

OSCache provides a FIFO and a LRU implementation of a cache.

In addition to FIFO and LRU, EHCache provides a LFU implementation of a cache.

Thinking about how these algorithms work, it is easy to see that there are certain cases where using one over the other provides a great advantage. For example in the case of LRU, which seems to be the widely accepted and most used caching algorithm, this cache works great when the majority of the hits come to a very concentrated group of items. This way, most hits, if not all, are retrieved from the cache. However, if there is a large scan of all the data, once the cache reaches its max size LRU will just remove items out on every hit. If the cache can hold a max of 50 items and you have 100 records, as you iterate over the 100 records, the cache will empty out the first 50 records to put in the second half of the records, resulting in lots of add/removing to the cache and 0 cache hits. Algorithms that prevent this from happening, like LFU, are known as scan-resistant.

I was interested in finding if there was some middle ground that gave me the best of both worlds LRU and LFU. It turns out there is.

The algorithm is known as Adaptive Replacement Cache (ARC). It gives you the benefits of LRU as well does a balancing act to prevent data scans from polluting the cache. It does by keeping track of two lists, one for recently references items and another or frequently referenced items. If you read about it, it’s a pretty cool algorithm.

I was excited when I came across this algorithm because I thought it would make such a fine addition as an open source project. And then I discovered it was patented. Apparently, PostgreSQL already went through this exercise and deemed it safer to not use it.

So, now I’m thinking I need a new idea for a project.

Open Source and Caching Algorithms

Re: Young IT workers disillusioned

After reading this article, I had to vent on the ridiculousness of the article.

The article is pretty much a summary of a survey done by an IT staffing firm. It states that entry-level, 20-something year old employees are the most difficult to manage because they have high expectations from their employer. High expectations like good salaries, bonuses/rewards, and an office. Some pointy-haired boss mentioned in the article said “the problem between employers and the younger generation just entering the workforce can be traced back to the employees’ upbringing or an easier way of life for children in the United States today.”

With condescending statements like that, why would I want to work for you? This sounds like a classic “When I was your age”-ism that creates a crappy work environment.

I’m in the IT industry, I’m a 20-something year old, and I guess you could still consider me “entry-level”. I’m not disillusioned about my place in the industry. I don’t expect you to give me the title of CTO of a Fortune 500 company. I don’t need a secretary and a personal assistant. I do however expect to be paid accordingly to the value I bring the company. Why wouldn’t someone want to be rewarded if they do outstanding work? Not having a reward doesn’t stop me from doing outstanding work. You need to find employees that hold themselves to a higher standard than you do. I personally don’t care for offices but it’s clear that having an door that shuts allows developers to be more production due to less distractions.

Just because I may be an entry-level employee, that doesn’t mean I bring any less value to the employer. What qualifies an employer to treat an entry-level employee as a second-class employee? In the IT industry, where experience seems to be king, I have struggled to understand why that is. Experience is fine and dandy but it only takes you only so far. Raw talent should be valued more. Between 3 rock star 20-something year olds and 3 mediocre experienced 40+ year olds, I will bet on the rock stars every single time.

It sounds to me that it’s not the young IT workers that are disillusioned but it’s the employers. They want rock stars employees without providing any incentive.

Re: Young IT workers disillusioned

Fresh start!

After trying to keep a blog numerous times already, I am committed to keep this one going!

It took me awhile to get setup after rebuilding my slice (I am hosted by SliceHost). Most of the time was spent having to google the commands I had to run and figuring out missing apache modules since this is all new to me. It’s been a good learning experience though 🙂

But after 4 hours, I’ve setup the following

  • 256mb Ubuntu slice
  • apache2
  • mysql
  • subversion
  • php
  • ruby/rails

Next steps are to get Java and Tomcat installed, as well as a sample Rails app up and running.

Fresh start!