2012.01.12
At its core, SOPA represents a universal, base desire of men to protect what they own. To some degree, it’s valid, something that anyone can respect. However, as with most things, ideas mean precisely shite, and execution is what really matters. In the case of SOPA, a misguided, poorly informed government is attempting to execute an idea in a way that could do the most damage possible. Let me count the ways SOPA is fucking stupid.
SOPA threatens free speech. This has been said by many folks who are much smarter than I. Any bill that allows the silencing of voices should not see the light of day. ‘Nuff said.
SOPA will starve and fracture an industry. A measure that could be taken against an “infringing” site is forcing advertising networks to stop serving ads for said site. For many sites, advertising is their meat and potatoes. An equally despicable measure is forcing payment services (can you say paypal?) to stop accepting transactions from the infringing site. Any site that relies on e-commerce is now proper fucked. Any service that provides a payment gateway or ad services is now unreliable. There will be a rift between providers that cooperate with SOPA, and those that don’t.
SOPA will break the one thing that makes the Internet accessible to humans, DNS. DNS is the system whereby domains names are translated to network addresses. Assigning IP addresses easy-to-remember names is one of the reasons the Internet has become a viable medium. As an extreme measure, SOPA will alter a site’s DNS records to point somewhere else. This last measure make is pretty clear the authors of these bills are complete dipshits.
While altering DNS will render the site inaccessible to most, it does not remove the existence or accessibility of content from the Internet. This very post is available here, whether DNS is up or not. To cope with a broken DNS system, the Internet will respond, and it will not be pleasant. Hardware vendors will ship with host files set up to protect their own interests. Rogue DNS resolvers will pop up. The Internet will turn into Bartertown. Two browsers will enter, neither will find Facebook.
The internet industry and e-commerce have proved to be the country’s highest growth sectors in the past few years. One of the main contributors to that growth has been the availability of honest, reasonably reliable, interconnected services. They’ve given the classic humble entrepreneur + code monkey team the tools to build a business that yields riches. Compromise those tools, and you will destroy an industry, not to mention perhaps the last golden vestige of American opportunity. Creating a market-based system to punish violators will only destroy the system. To help combat SOPA, contact your local congressperson, or go here. To read the bill, click here.
2012.01.03
Normally, I’ve considered New Year’s Resolutions to be for whiners, people who never actually accomplish anything, people who are normally on the whhhaaaambulance. Most resolutions fall somewhere in the “stop being fat” to “be fluent in mesopotamian glyphs” range. They’re vague, completely un-actionable, and just describe a slightly unattainable goal / end result / dumbass want. This year, however, I’ve discovered a few serious problems in my life I need to fix. So, in the spirit of actually doing shit, I’ve provided a list of non-whiner resolutions + ways to actually make them happen.
1. Go to bed early(ish). This is a tough one for me. I’m naturally a night owl, but I might be able to fool myself to getting to bed early by showering early(ish). I love me a good shower. I’ve even been known to drink a beer in the shower. I also really like being in really warm, comfortable, if slightly embarrassing, PJs. All these things put me in a good mood and generally make me want to relax, which is not all that far from being asleep. Action Steps: Just get in the fucking shower, don’t look at the mail / email / dirty dishes / messy apartment / email / Twitter, just get in the shower (bring a beer).
2. See some doctors. I’ve avoided doctors for awhile, mostly because my lifestyle is a cross between Denis Nedry from Jurassic Park and a barfly. I consider this pretty simple. Action Steps: Make appointments with the following: dentist, general physician, eye doctor, nutritionist. Do it. Do what they say, even if it sucks. Follow up as often as the quacks say so.
3. Go to the gym regularly. OK, this one is, without a doubt, the most cliche, whine-tastic resolution evar. I know, because I have been to the gym in January. I’ve also been to the gym in April, when all the kids who were at the gym in January are nowhere to be found. Also, since the gym is a #creepy and #gross place to shower, resolution #1 should be even more important. Action Steps: Put that ish on the Google calendar with the following reminders; 2 hours, 1 hour, 30 minutes, 15 minutes, 10 minutes, and 5 minutes before. Keep gym clothes in the office. Don’t care how bad they smell.
4. Blog more. Writing has been a great way to get me to collect my thoughts, find some hindsight, and maybe, just maybe, help some other folks who have the same demented thoughts / stupid problems. As a technical guy, “the inspiration” doesn’t hit me so often, and when it does, I’m often busy, y’know, actually doing shit. However, as I’ve noted to myself more than once, keeping track of my day and journaling how I spend my time is something incredibly important for introspection. Action Steps: Write that thought down. Write down what you did 30 seconds ago, especially if it was different from regularly scheduled programming. Keep a sticky on your monitor to write shite down. Ask the dude next to you (@bossjones) to remind you to you write shit down. Lastly, a glass (or 7) of white wine, the notebook in which all your shit is written, and wordpress should convene regularly. Google Calendar #ftw, again. Lastly, check Google Analytics on posts. The un-monitored blog post is not worth writing.
5. Read more. Once upon a time (yesterday) I didn’t know nearly as much as I do now. Most of that knowledge came from reading shit-tons of blogs, books, bathroom graffiti, articles, and whitepapers related to web development. I read everything with a goal: How can I use, or leverage this to help me / my business work a little better? The Action Steps here are a bit tougher, and slightly conflict with non-whiner resolution #6: Keep Google Reader open. Curate my list of feeds with relevant sources. Prune feeds that stopped providing useful information. Lastly, and perhaps most important, find tidbits of information that make a difference in my life and / or business.
6. Don’t be distracted by bold numbers in parentheses. Simple (kinda). Action Steps: close Gmail, close Twitter. Try, and #fail, to delete my Facebook account.
7. Stop playing so much fucking air guitar by myself, alone in my apt, and start playing some real guitar, and actually learn the songs I normally rock out to. Action Steps: restring the Epiphone, buy a new amp, find tabs for shit I want to learn. If I’m feeling really frisky, get back into a band.
2011.11.26
Sushi restaurants hold a special place in my heart as possibly the worst place to go for a meal. Granted, I have not been to many, but the ones I have been to all share the same horrible characteristics.
The service is always has a certain briskness to it. To many, this is great, you order, get your food, eat, and have your plates taken away almost as the last piece of sashimi leaves your chopsticks. Rice notwithstanding, once most of your food is gone, so is your plate, replaced almost as quickly with the check. While many consider speed to be a feature, I’m not sure I agree when it comes to my dinner. While the staff at every sushi joint I’ve ever been to has been efficient, they’ve also been less than accommodating when it comes to simple things, like recommendations.
Ambiance and fellow patrons normally leave something to be desired as well. Somehow, sushi has come to be the meal of choice for the screaming hordes of clubgoers, Jersey Shore wannabes, and that certain type douchebag that only comes at night. Thusly, the soundtrack of most of these places closely resembles being inside a speaker cabinet while DJ Pauly D spins whatever the fuck it is that he spins. Again, nothing wrong with that, but not while consuming raw fish.
Then, there’s sushi itself. Don’t get me wrong, I like sushi, particularly a real nice piece of toro when it’s nice and cold. And I’ve been adventurous enough to have tried some more exotic options, like uni, which is without a doubt, a taste you need to acquire. I also understand that it’s considered an art form. But what bothers me most about going out for sushi is the vast majority of places do not regard it that way.
2011.11.02
Yesterday, I had to run a query for some statistics I needed. This was a query that I knew were going to be particularly nasty as it required sorting 1.3M rows. Normally I run these sorts of queries on a reporting slave I keep around for this reason, but for some reason I chose to run this query on a production slave. When I ran my query, I got the following error;
ERROR 3 (HY000): Error writing file ‘/tmp/MYNcSyQ9′ (Errcode: 28)
Oh. *&^%. After some Googling, a bit of shitting my pants, and a wild grep session through as many application logs as I could find, I was able to figure out that problem seemed limited to this particular query. My Googling turned up the fact that the error code indicated that the server was out of disk space.
As a rapidly growing company, we’ve had our fair share of issues with managing (or failing to manage) rapidly filling disks, failed RAID controllers, and the like. However, I had recently done audits of this particular cluster of servers, and ascertained that the situation with disks was nominal. I was confident the disk wasn’t full, and permissions were correct. Our particular disk layout puts /tmp on its own 2GB partition, and after running the query, that partition was 2% full.
It turns out that during the execution of the query, MySQL was creating a temporary table that was 2GB, hence the error. By default MySQL will write temporary tables to /tmp, which in many cases, is its own small partition. The solution here was to set the tmpdir to a folder on the main partition adjacent to the MySQL datadir. This solution obviously has its own problems (ie you could fill your main partition, which is way worse than filling /tmp) However, for this type of ad hoc query, this was exactly what we needed.
2011.07.31
Creative Index
The now defunct Creative Index was a search engine aimed at indexing portfolio sites. The Creative Index was perhaps the most open-ended project I have ever taken on. The goal was to allow people to list their various portfolio sites, have the Creative Index scrape their sites, index all the textual content, and make it searchable via a Google-like interface. Of course, the project was doomed from the beginning, as results would be measured against Google, and we all know how that goes. I had never taken on a project that required taking such unstructured data, also another reason it was doomed. Most portfolio sites contain very little text, which makes matching and ranking difficult.
And that’s when I discovered the Mars Volta. While writing the engine to handle the retrieval of web pages, I learned just how chaotic the underbelly of Internet is. Circular redirects, 404s, bad links, authenticated pages made my code check hundreds of variables in the most paranoid, chaotic way possible. The Mars Volta’s drug-induced, hallucination-inspired free-form rock-jazz-samba was a great soundtrack to the chaos I was trying to make sense of.
Creative Portfolio Display
In July of 2010, I had the distinct honor of developing one of the few InApps for Linkedin. LinkedIn’s InApp platform runs on Google OpenSocial. OpenSocial is a great way to plug in 3rd party apps in a secure way. However, the normal development workflow changes quite a bit, as OpenSocial acts as a caching proxy. So in order to get changes in your app down to the user/tester/you, you need to set an additional variable that will re-retrieve the specification for your app. In order to get to that, you need to find the URL to the iframe that contains your app, which is only available in a javascript block, add the cache busting variable, drop the URL in your browser, and hard refresh. That only worked sometimes. And when it did work, it was pretty much guaranteed that the change you made didn’t.
Needless to say, the workflow was painful, even on the best days. Add to that, some weird firewall issues, and you had a situation that would make St Francis of Assisi murder kittens. That’s where Passion Pit came in. Their music is just so…damn…happy. In most cases Passion Pit saved me from putting my fist through my monitor.
RightScale, Rackspace Cloud configuration
In attempt to save ourselves some money, and automate a lot of the SysAdmin work I’d been doing by hand over the past couple years, I undertook a partnership with RightScale. Since in every case, the servers I was deploying didn’t have php, or any other language I knew by default, I had to resort to bash, which I didn’t know. This project took me very out of my comfort zone, had a Mt Everest of a learning curve, and was so essential to our growth that it couldn’t fail. There were also tons of moving parts that were out of my control. RightScale’s integration with the Rackspace cloud is in Beta, which meant that in addition to struggling through a language I didn’t know, I had to differentiate my own errors from things that were problems with the sever images. Tons ‘o fun.
In stepped The Bronx (III), probably one of the most solid rock bands I’ve heard in a long time. Their tracks had a real sense of purpose, and the lyrics echoed a lot of my desperation. In particular, the line in Pleasure Seekers where desperation is cited as inspiration totally got me through.
2011.06.25
Part of managing any large site involves writing scripts that will go through oyur data, make changes, merge things, remove things, do type transformations, etc. Most of the time, in PHP, iterating through rows or objects will do just fine. However, when there are lots of rows or objects, you could be faced with a script that takes hours or days to run. Depending on how often active the is, you may need to restrict access to ensure that the data before and after the transformation remains consistent. In other words, if someone tries to make a change to the data before the transformation, and the new feature only looks at data after the transformation, that user has just lost their changes. That is Very Bad.
As sites get larger and problems like this loom, taking the site offline becomes less and less of an option. This is what the business team calls a luxury problem, and what the ops team refers to simply as a problem. One option is to write a more efficient script. You can get pretty far by simply ensuring you’re reading from the fastest data source available, make good use of cache, etc. ensure that the tables being read for the transformation are properly indexed. All of these are great places to start. Additionally, making sure that data is grabbed in chunks can give the database time to breathe. There’s nothing worse than getting stuck in MySQL’s “sending data” phase simply because it needs to read several thousand rows from disk. MySQL configuration can also be your friend here. If using InnoDB, increasing the insert buffer is a great way to speed up writes.*
However, as much as you can do to speed up a single transaction, the fact remains that you have to execute each transformation serially, one after another. Your bottleneck is the transformation itself. It will take (# of transformations * # of objects to transform) to complete the job. No matter how well tuned the database is, it will only be performing one operation at a time, which means that the other (max connections – 1) connections are doing precisely crap. So the next logical step is to change your update script to distribute the update operations so a few can be run in parrallel.
Rewriting the update script does require thinking about your update differently, and will not work in every case. For example, if one is simply moving a large amount of data from one table to another, and there is no transformation, or the transformation can be accomplished via a builtin MySQL function, use that. However, just be prepared to deal with locking issues, and the source data potentially not being available while the transformation is taking place. However, if your transformation is complicated, and requires per-case logic, this is definitely a good route to take. The biggest difference is how the code for the update is organized. The update script needs to be separated out into code that will apply the transformation for exactly one entity, and code that will manage which entities get transformed and when. Ideally, the code for the transformation is idempotent, so failures can be handled by simply resubmitting the entity / object to be transformed again.
Accomplishing parallel processing in PHP can be kind of tricky. Php’s pcntl_exec function has always felt a bit finicky to me. Of course exec on its own it blocking, so that’s out. Additionally, neither of these solutions offer any sort of baked-in communication between the process that submitted the job, and the process carrying out the job. That leaves us with a queuing system. Popular systems include: RabbitMQ and Gearman. Personally, I’ve made great use of Gearman. It’s easy to install, as is the PHP module.
To sum up, performing large data updates via a distributed system is the way to go if you have complex requirements per transformation, and the option to perform these processes in parallel.
*If using MySQL’s MyISAM engine, this isn’t necessarily true, as writes will block, and the database could become the bottleneck. However, since MySQL is continuing to push InnnDB, this is getting increasingly unlikely. So if your tables are all InnoDB, you’re probably in good shape.
2011.06.13
While flying to Austin for sxsw, I had a small programming task. Take a string of a few search terms, break it apart and highlight those terms in another string. It’s a straightforward task, and probably a wheel that’s been reinvented thousands of time in the history of computer science. I approached it as an exercise, to see if I could add another squeaky wheel to the pile. My goal was to do it without using any 3rd party code or any resources. I had no access to documentation, google, stack overflow, or any of the other resources I use constantly to get my job done every day.
The code that I produced was bloated, naive, and horribly inefficient (I suspect). While writing it, i knew I wasn’t really on the right path. When I got back to New York, I took a look at it, and more or less decided I had wasted my time. Then I realized I had written it on a plane, and had nothing better to do. I simply got myself into the zone, and wanted to work through a problem until it was solved. After I got over my initial disgust, I wondered what aside from boredom and stubbornness had prompted me to complete the task.
I never really came to any conclusions until a few days later. I was going about my day normally, fixing bugs, writing emails, troubleshooting. As I hit a hard spot, something I couldn’t figure out, I gave up staring at the code, and turned to Google. Then I came across a builtin php function that was giving me a strange result. After puzzling for a few seconds, I dropped the function into Google. A little while later, I was examining the results of an EXPLAIN statement in MySQL, and the output was something I hadn’t seen before. I found the answer on StackOverflow a few minutes later.
Then it dawned on me. Maybe I don’t actually have the skills to be a web developer, and I’ve faked it all these years. Maybe I don’t know all that much about MySQL, and perhaps I only know enough about Linux to cause problems for Rackspace. Whether or not that’s true, I did realize that I’m pretty good at finding solutions to problems from the collective experiences, wisdom, and flames of the Internet. Maybe it’s not entirely fair to say that I faked my way through several years of a career. After all the code that I’ve put together over the years to answer various questions, or sift through or collect data serves a purpose, performs relatively well, and is serving people everyday. Also, that disgusting snippet of string highlighting code works pretty well, despite that fact that I hate its face and want it to die.
After I got myself out of my existential development funk, more questions came to mind. First, how the F did anyone get any answers to tough questions before the Internet? Secondly, how did programmers back in the day find any sort of direction? Books on technology and programming are great, don’t get me wrong, but you can’t get answers to complicated questions. After having these thoughts crop up, I spent a little bit of time looking over other devs’ shoulders at the office. What I saw was very reassuring, as the Google machine was often hard at work for the rest of the team. The php site, StackOverflow, and QuirksMode were in browsers constantly.
Which begs yet another question: what exactly does it take to be a web programmer? Based on my experience, it seems to boil down to an Internet connection, Google, tenacity to the point of stupidity, and decent search skills. To back up even further, is it possible to take on a job you know nothing about, and learn how do it via the Internet?
2011.03.16
During my time at the interactive portion of SXSW, I was looking for great technical panels on practical ways to improve my technical skills. While I found a bunch of panels that addressed some interesting issues, I don’t think I saw any server side code the entire time I was there. There were a number of great CSS and HTML5 talks, but aside from the PHP workshop that was listed at the wrong time in the booklet, I found no practical talks for backend developers.
Over the past couple years, it seems like there has definitely been a shift in how sxsw views technical talks. Two years ago, there were developers presenting great content on how to structure APIs, write great PHP, and develop iPhone / Android Apps. This year, the php workshop had the incorrect time printed in the booklet. The technical talks this year seemed be designed for the nontechnical. The discussions around scaling included big names quoting scaling statistics that are sure to be taken out of context and read as homily by technical managers everywhere. The Android Developer meetup was almost completely devoid of Android devs, just sharks looking for them, myself included. The panel on the death of the RDBMS painted a rosy picture of what databases could be, but did not include a mention of a single technology that fit the presenter’s pipe dream.
The ubiquitousness of the term social media at SXSW leaves me with one conclusion. SXSW has changed from a gathering that was about people doing cool stuff to a group of people talking about stuff they think is cool right now. There has been a lot of discussion on conversation, but not much conversation on how to create things worth talking about.
2011.03.13
A classic UX problems is communicating to users how long they’ll have to wait before their task completes. A spinner or progress bar provides feedback that the system is, in fact, doing something, and how long that task may take. Psychologically, progress bars create tension while progressing, and resolution when completed.
From a technical standpoint, progress bars are black magic. The developer is attempting to estimate a task based on potentially thousands of variables. In the case of a file upload, the developer has to deal with differing network conditions, disk performance, etc, etc, etc. Then they have to write the code to communicate what is happening to the browser. Not a trivial task. However, when executed well, can provide the user with reasonable feedback about their task.
Lately, sites like LinkedIn, Mint.com, and OKCupid have used that same tension to motivate users to completely fill out their profiles. During profile creation, a progress bar is displayed indicating how far the user has come along. Once the user completely fills out their profile, the progress bar hits 100%, and what changes? In most cases, nothing. The progress bar is just a psychological hack to entice users to go through the entire process.
The question is: Exactly how effective is the progress bar at enticing users to fully complete the task at hand? And are they actually worth it.
2010.10.22
Things never go wrong at convenient times: Like when you’re auditing the latest, coolest version of your app, and looking for bugs. Things have a funny way of working out fine then. However, soon as you look the other way, a multitude of problems come out of the woodwork. It usually goes something like this:
One server goes down, and the system that was supposed to fail silently starts screaming. The application it was supporting goes down, because the proper timeouts and error handling was never written. You can’t fail over, because failing over will take down 2 other applications. When that first server comes back up, nothing works, because the proper startup scripts were never put in place. Once the right services start, if you can remember what the hell they were, you find the original application is configured wrong. Not only is it configured wrong, it’s always been configured wrong, and no one noticed. No one noticed because it only explodes in the exact set of horrible circumstances you have right now. Which is, by the way, being down.
It’s an all-too-familiar story, and one that even most the anal of admins has dealt with. The fact of the matter is that it is going to happen, and there’s not a whole lot you can do to prepare, other than randomly pulling plugs out of servers. But with any mistake that causes downtime, it should only happen once. Proper postmortem examination needs to be taken here to figure out what went wrong where. Once all the variables are understood, the next step is to duplicate the same set of circumstances in your sandbox, and apply the necessary error handling.
Downtime and emergencies are a part of running any site. What’s really important is to treat emergencies as an opportunity to learn about what happens when systems fail, for real.
|