Regular expressions: Greedy regex versus lazy regex

Here’s a typical regex scenario: You’ve got a string in which you need to find/capture the HTML tags from. Let’s say our string is:

This is a <em>first</em> test

Typically, you’d write a regular expression to capture the tag by writing this:

var re = new RegExp("<(.+)>", "");

Unexpectedly, however, the result you’re going to get matched back is:

"em>first</em"

The reason for this is explained on the Regex Tutorial website:

The first token in the regex is . You should see the problem by now. The dot matches the >, and the engine continues repeating the dot. The dot will match all remaining characters in the string. The dot fails when the engine has reached the void after the end of the string. Only at this point does the regex engine continue with the next token: >.

What we need to do instead is force the dot character to be lazy by adding a question mark after the plus sign (or a star, or numbers in curly braces):

var re = new RegExp("<(.+?)>", "");

This time, we’ll get back:

"em"

Reference: Regex Tutorial – Repetition with Star and Plus

Hacking the Budweiser Red Light (Part I): Identifying the network traffic that activates the light

The Budweiser Red Light is one of the best pieces of marketing I’ve ever seen. And while I’m enough of a hockey fan to want to pick up one of these anyways, the real prize is in figuring out how to make the thing go off whenever I choose.

Where to get started? While I’ve seen an attempt to use/modify the Electric Imp card inside the light to accept requests, I thought my approach might be simpler: Sniff the network traffic to and from the light, replicate it to sound the alarm.

The instruments used to sniff the traffic were as follows:

  • A MacBook Pro (to set up Internet Sharing upon for the Red Light to connect to)
  • My Android OS smartphone (to install and pair to the Red Light using the app Budweiser developed for that purpose)
  • WireShark (a network protocol analyzer) installed on the MacBook Pro, which also requires XQuartz

Instructions

I kept careful notes in case someone else wanted to replicate this experiment; those instructions are:

  1. Enable Internet Sharing on the MacBook Pro, and make sure you set no password or key. (At home where I have an Windows 8.1 PC, I had attempted to set up a Wi-Fi hotspot but the Red Light appeared to have difficulty getting on that network.) I named my new Wi-Fi access point “BudRedLight”.
  2. I had my Android OS smartphone join the Wi-Fi access point of “BudRedLight”, which it did without issue.
  3. In order to sniff its traffic, we’ll need to get the Red Light on the “BudRedLight” access point as well. This means installing and setting up the official Budweiser Red Lights app, and using the interesting flashing light method of sending the Wi-Fi connection details from the phone to the Red Light.
  4. Once that’s done, I left the app open on the phone, specifically staying on the screen that provided the “TEST YOUR LIGHT” button. The plan was to capture what was received by the Red Light once this was pressed.
  5. Now that the Red Light and my phone were both on the “BudRedLight” access point, it was time to boot up WireShark on the MacBook Pro (I had to start XQuartz first so WireShark would run).
  6. Let’s get WireShark listening to the traffic coming in and out of “BudRedLight”: Select the Capture menu option, and then Options. Unclick the “Use promiscuous mode on all interfaces option” (this will cut down on the amount of noise being captured). Instead, double-click on the Wi-Fi listing, and within it check the “Capture packets in promiscuous mode” option and hit OK. (Need a visual guide? Here’s a screenshot.)
  7. Okay – we’re now capturing traffic! Tap the “TEST YOUR LIGHT“; this will make your phone send a HTTPS request up to some remote server, which in turn appears to send a TCP [PSH, ACK] packet to the Red Light. In the screenshot of WireShark below, I pressed the “TEST YOUR LIGHT” twice, resulting in the two [PSH, ACK] packets listed:bud-red-light-tcp-psh-ack-traffic
  8. So what’s in that packet? Only 234 bytes of it contain actual data, so let’s see what that looks like (via WireShark):bud-red-light-packet-inspection

Next Steps

So we’ve got an example of the data used to set off the Red Light, but we don’t really know what’s contained in that data.

Deep packet inspection isn’t really my thing, so at this point I’ve started asking around for possibilities. Here are the early contenders:

  • Use tcpreplay, Ostinato or some other application to “replay” sending of the packet shown above to the Red Light; maybe we don’t really even need to know what’s in the packet and this will set it off.
  • Find and use some other utility (or person!) that can tell us how to further decipher what’s in the data seen above.

If you’d like to help out, you certainly can! I’ve uploaded a zipfile of a PCAP file containing the packet I’ve displayed above. Feel free to try and dicipher or replay sending of that packet on your own! Please leave a comment if you do so; it’d be great to solve this for everyone.

When “compass compile” leads to an ‘Invalid UTF-8 character “\xCA”‘ error

We’ve got some pretty large CSS files to work with on a few projects I work on, and use SCSS plus the Compass tool to make management of the styles a bit easier. Recently, though, Compass has been throwing us this error:

    error scss/sportsnet.scss (Line 969 of scss/_inc-controls.scss: Invalid UTF-8 character "\xCA")

The root cause appears to be non-ASCII characters in a SCSS file. That’s easy enough to root out by hand if the file is a manageable size, but if you need an automated filter instead, try the following:

iconv -t ASCII//IGNORE -f UTF8 < _inc-controls.scss > _inc-controls.ascii

This pipes the offending SCSS file through iconv, and spits out the file _inc-controls.ascii which should contain differing lines where non-ASCII lines are found.

Overwriting a branch in Git with remote results (and bypassing merge conflicts)

If you’ve ever tried to do a git pull to get a branch up to date and have run into conflicts that you just want to bypass and go with the contents of the remote branch, do the following:

# Fetch from the default remote of origin.
git fetch origin
# Check out the branch you want to override the conflicts upon.
git checkout master
# Reset your current branch of master to origin's master.
git reset --hard origin/master

Reference: Stack Overflow: Git pull from remote.. Can I force it to overwrite rather than report conflicts?

Get a list of files added, removed and modified in Git between two branches

It’s sometimes useful to know exactly what files you’ve made changes to in your feature branch as you’re getting ready to merge back into your trunk code. Git’s diff option allows for this to be done quick easily:

git diff --name-status master..newfeature

The output looks like the following:

M       .gitmodules
A       plugins/backplane.php
A       plugins/multiple-post-thumbnails
M       themes/sportsnet/css/scss/_inc-controls.scss
M       themes/sportsnet/functions.php
M       themes/sportsnet/header.php
M       themes/sportsnet/single-sn-article.php
M       themes/sportsnet/single-sn-blog-entry.php
M       themes/sportsnet/single-sn-signing.php
M       themes/sportsnet/single-sn-trade.php
A       themes/sportsnet/zones/articles-comment-form.php
A       themes/sportsnet/zones/global/user-account-links.php
(END)

Reference: Stack Overflow – Showing which files have changed between git branches

Where all of my placeholder images come from

Years back, I happened upon fractalsponge.net, a one-man effort in creating extremely high resolution Star Wars renders of the various spacecraft in that fictional universe.

Downloading more than 5,000 rows from Google Analytics at a time

I’ve been increasing the amount of click tracking I do via Google Analytics, but getting the data back out of that system can be a bit of a time when you’re talking large amounts of data.

Enter Download Analytics, which will ask for input of a URL to the report page you’re viewing, and will automatically prompt to e-mail you (when ready) a link that will allow you to download the entire list of rows in your report. Simple, fast, effective.

Using Mozilla Thunderbird to access your Outlook Web Access (OWA) e-mail account

Like many others, my employer makes heavy use of Microsoft’s Exchange system to run its corporate e-mail infrastructure. And for the most part I think highly of that collection of software and services: Microsoft Outlook and the Exchange system that runs behind it are full-featured and generally terrific. Unfortunately, its Web-based offering is pretty lame in comparison, and is generally set up on hardware that strains to keep up with the amount of use that it faces.

I’ve always liked Mozilla’s free Thunderbird e-mail client, and have tried in various ways to get it to pull down e-mail from Outlook Web Access (OWA) without success. Having finally sat down and considered the problem carefully, I’ve hit on success. The key is an amazing little piece of free software called DavMail Gateway, which installs on your local system and acts as a translator between your e-mail client and the remote OWA server. For posterity, I thought I’d capture how my setup works.

  1. Download and install DavMail Gateway to your system. On my Linux (Ubuntu 13) system, I followed the Debian package instructions, and actually did not apply any extra patches to get it running successfully.
  2. Next, I booted up DavMail Gateway from my Applications menu, which presented me with a setup screen. Here, I copied the settings I found in another post on the web and visible in the image shown above, but I’ll break them down again in specifics…
  3. Change the OWA URL value to http://webmail.rci.rogers.com/exchange/ .
  4. Leave all other port settings as-is (specifically IMAP as 1143 and SMTP as 1025) and press the Save button.
  5. Start up Mozilla Thunderbird and create a new account.
  6. Enter your Full Name, E-mail Address and the Password that you would use to access your e-mail regularly via OWA. Click the Test button; when it fails, you’ll be presented with a new set of fields.
  7. Your Incoming settings should be as follows: A server type of IMAP; a Server Hostname of “localhost” (no quotes), a Port of 1143, a SSL setting of “None”, and Authentication set to “Normal Password”.
  8. Your Outgoing settings should be as follows: A server type of SMTP; a Server Hostname of “localhost” (no quotes), a Port of 1025, a SSL setting of “None”, and Authentication set to “Normal Password”.
  9. Just one last field to set; have your Username value be your username without any domain: In my case, that would simply be “sully.syed” (no quotes).
  10. Click the Re-test button; this time, Thunderbird should let you proceed, and will begin the process of synchronizing your e-mail client with your mail. Welcome to webmail via Thunderbird!

Sources:

Checking Akamai cache expiry times on your website’s pages

This involves sending some custom headers along with your HTTP GET request, so utilize either the wget command line tool:

wget -S -O /dev/null --header="Pragma: akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no" http://www.sportsnet.ca/

Or the curl command line tool:

curl -H "Pragma: akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no" -IXGET http://www.sportsnet.ca/

The X-Cache-Key setting will contain the amount of time the URL is cached for; in this example, the time is 1 minute (“1m”):

X-Cache-Key: /L/370/77322/1m/www.sportsnet.ca/

Running Windows? No problem – grab a compiled version of wget for Windows.

Source: Stack Overflow – What’s the best way to troubleshoot Akamai headers these days?

Costco: An internal promotion trumps an MBA any day of the week

I noticed this as well, but it took a blog post in The Washington Monthly to crystallize it in my mind.

The Washington Monthly – Political Animal – The secret of Costco’s success revealed! (hint: no MBAs need apply)

… Costco does not hire business school graduates—thanks to another idiosyncrasy meant to preserve its distinct company culture. It cultivates employees who work the floor in its warehouses and sponsors them through graduate school. Seventy percent of its warehouse managers started at the company by pushing carts and ringing cash registers.

Those sentences speak volumes. They tell you that Costco is a company that values its own hard-won experience over trendy B-school subjects like management theory and Econ 101 abstractions. They’ve found a formula that works and they’re not going to mess with it. I’ve long found the typical B-school curriculum to be problematic. On the one hand, you have management “theory,” which frequently is not well-supported by rigorous research, and might be characterized as more theological than anything else — Tom Frank has often been insightful about the ideological function served by this kind of business literature.

Then, on the other hand, you have B-school economics. One of the great sins about economics as a university subject is that, particularly at the introductory and intermediate levels where people are most likely to study it, the econ that gets taught tends to be almost entirely theoretical, not empirical. Few economists understand how businesses work, because few of them have actually bothered to ask businesspeople how they make business decisions. Instead, they make assumptions. But even assumptions that seem highly plausible in theory can turn out to be wildly off-base in fact.

Getting back to Costco: the abstract theorizing that MBA students learn in microeconomics courses often has little relevance to practical business situations. The simplified textbook models teach the lesson that policies like unions and the minimum wage are inefficient and wrong — that message comes through loud and clear. Economics as it’s taught in most American colleges today more or less encourages poor labor practices.