elehack.net

Michael's Blog

Bayesian Statistics

If you’ve been following my Twitter stream, you have probably seen that I’m doing some reading and study on Bayesian statistics lately. For a variety of reasons, I find the Bayesian model of statistics quite compelling and am hoping to be able to use it in some of my research.

Traditional statistics, encapsulating well-known methods such as t-tests, ANOVA, etc. are from the frequentist school of statistical thought. The basic idea of frequentist statistics is that the world is described by parameters that are fixed and unknown. These parameters can be all manner of things — the rotation rate of the earth, the average life span of a naked mole rat, or the average number of kittens in a litter of cats. It is rare that we can have access to the entire population of interest (e.g. all mature female cats) to be able to directly measure the parameter, so we estimate parameters by taking random samples from the population, computing some statistic over the sample, and using that as our estimate of the population parameter. Since these parameters are unknown, we do not know their exact values. Since they are fixed, however, we cannot discuss them in probabilistic terms. Probabilistic reasoning only applies to random variables, and parameters are not random — we just don’t know what their values are. Probabilities, expected values, etc. are only meaningful in the context of the outcome of multiple repeated random experiments drawn from the population.

Read more...

Sales Tax and E-Commerce — Not a Simple Problem

In the age of e-commerce, sales taxes are a difficult problem. Currently, online retailers such as Amazon.com are under the same rule as mail-order sellers traditionally have been: they only have to collect sales tax from customers located in their state (or states in which they have a physical presence). Recently, this has gained greater attention due to several states passing measures to count affiliate program members (kickbacks for links on blogs, etc.) as a physical presence, so Amazon.com would be required to collect sales tax from customers in any state in which one of its affiliates resides. These affiliates are often private individuals and do no direct sales for Amazon — only referrals — but, in an effort to regain their tax base, states are wanting to see them as a physical presence.

I do not question that e-commerce is presenting significant problems for local economies. Money spent online does not stay in the local economy, and moving sales out-of-state does decrease the tax base for state income taxes. While most states require residents to pay sales tax themselves on out-of-state purchases, it is likely that few actually do so. Particularly in the present time of tight state budgets, this certainly isn’t helping matters.

Read more...

Filter Bubbles and DuckDuckGo

Today, DuckDuckGo launched a new ad/awareness campaign around the concept of Filter Bubbles, promoting DuckDuckGo as a way of escaping the filter bubbles surrounding your searches on other search engines such as Google or Bing. My initial reaction was “I’m not sure what to think.” Bubbles seem like a credible potential problem, but DDG seemed to be jumping on a bandwagon on this one without any consideration to the complexity and subtlety of the filter bubble issue.

“Filter bubbles” are the name given to Eli Pariser for a potential effect that we’ve been aware of in the recommender systems community for several years (under terms such as “balkanization” and various diversity-related issues). The basic idea is this: as web services become more and more personalized, using recommender systems to tailor your experience to your interests, likes, and dislikes, you become isolated in an echo chamber of your own thoughts. If your news service decides what to show you based primarily on what you like reading, you may well get a one-sided or otherwise narrow view of current events. You may not come in contact with other views as often, and perhaps forget they exist. If your music service only plays music it knows you like, you may not find anything new.

Read more...

Getting Things Typed: External Trusted Systems for Programming

One of the major tenants of David Allen’s Getting Things Done methodology is the concept of an external trusted system — a system for storing information outside your brain so that it can be retrieved as needed and/or brought to your attention when appropriate. Our brains are often fickle, and we are apt to forget things. Further, by trying to remember them, we spend mental energy trying not to forget them so that, even if we do remember, our productivity is decreased by the stress of trying not to forget. Getting notes, appointments, tasks, and pretty much anything else we need to remember out of our heads and into a reliable external storage and retrieval system enables us to free up our minds to focus on what we really want to accomplish.

I’ve been realizing lately that robust static type and module systems fill a similar role when programming. I have better things to do with my brain cycles than remember the details of functions, what they require, and where they are used.

Read more...

Fixing the Dash Lights on a Dodge Caravan

We had a problem this last week with our ’03 Dodge Grand Caravan — the dash lights went out. Completely. Instrument panel, radio, heater controls — all unlit. My first thought, naturally, was a fuse.

However, when I looked at the fuse box, I couldn’t find any fuse that looked like it controlled the instrument panel backlighting. Web searching turned up a few things, including a fixya entry and a DodgeTalk.com forum post which document the same problem and an odd fix: disconnect the battery or otherwise cut power to the computer.

Read more...

Why I'm a Two-Spacer

How many spaces should you have after a sentence? As this piece in Slate lays out, the modern typographic rule is one. The two-space rule is an artifact of typewriters, whose monospaced typeface made two spaces a semi-necessary means of delineating the sentence space in a setting that afforded none of the spatial flexibility used by typographers to make paragraphs aesthetically pleasing. Modern use of double spaces after paragraphs, the story goes, is simply the residual effect of a bad habit required by inferior and now-obsolete technology.

So why do I put two spaces at the ends of my sentences?

Read more...

Idea for OCaml browser extension

This week, a variety of things clicked in to place for an idea.

I was thinking about how to do some cool web development in OCaml, supporting fast computation and manipulation in the browser. There exist a few projects, such as ocamljs and OBrowser, for running OCaml in the browser; ocamljs compiles it to JavaScript, and OBrowser interprets OCaml byte code in JavaScript.

Read more...

Principle of Least Yak-Shaving

Thanks to a tweet yesterday, the concept of yak shaving has been running through my mind. The Jargon File gives this definition of the term:

yak shaving: [MIT AI Lab, after 2000: orig. probably from a Ren & Stimpy episode.] Any seemingly pointless activity which is actually necessary to solve a problem which solves a problem which, several levels of recursion later, solves the real problem you’re working on.

Read more...

Web privacy

Today, I made some changes to our web server code and our privacy policy. The primary effect of these changes are that we no longer record the IP addresses of visitors to elehack.net. This change was prompted by our discovery of the search engine Duck Duck Go and particularly its privacy policy.

As you browse the web, a good deal of information is sent to web sites you view. I want to take this opportunity to provide a run-down of what some of this information is and how it can be used.

Read more...

Tuning the OCaml memory allocator for large data processing jobs

TL;DR: setting OCAMLRUNPARAM=s=4M,i=32M,o=150 can make your OCaml programs run faster. Read on for details and how to see if the garbage collector is thrashing and thereby slowing down your program.

In my research work with GroupLens, I do a most of my coding for data processing, algorithm implementation, etc. in OCaml. Sometimes I have to suffer a bit for this when some nice library doesn’t have OCaml bindings, but in general it works out fairly well. And every time I go to do some refactoring, I am reminded why I’m not coding in Python.

Read more...

Page 1 of 12 | Next >>>