Richard Branson hit the UK news recently with some comment/statement/assertion about his employees not having set vacation limits or a policy to follow. Unsurprisingly, it reminded me of the Netflix HR non-policy slides that hit the tech sphere a few years back. I was interested in how Branson's take was one thin slice of a bigger picture articulated by Netflix. Assuming that the Netflix model works, would taking one part of it in isolation work? Probably not.
Netflix Culture: freedom and responsibility slides on slideshare .
Paper
MOST WINNING A/B TEST RESULTS ARE ILLUSORY Martin Goodson (DPhil) Jan 2014
Summary
Demonstrates how application of standard statistical techniques are equally valid when applied to A/B testing, and how missing these can result in erronous conculsions being drawn from A/B test results.
- statistical power
- multiple testing
- regression to the mean
Statistical Power
Simply, that the size of the sample you measure increases the power of the result where power is the reliability of the measure to indicate a difference when there really is a difference.
For A/B testing this means you need to run an experiement for long enough that what your measuring is actually a difference. The paper includes a methodology for calculating sample size.
Multiple testing
• Performing many tests, not necessarily concurrently, will multiply the probability of encountering a false positive.
• False positives increase if you stop a test when you see a positive result.
Regression to the mean
Over a period of time even random results will regress to the mean. If you use a smaller time window you may identify early winners that are in fact random winners. Look out for the trends over time — if an initial uplift in A/B tests falls you may be observing regression to the mean.
Final quote
You can increase the robustness of your testing process by following this statistical standard practice:
• Use a valid hypothesis - don’t use a scattergun approach
• Do a power calculation first to estimate sample size
• Do not stop the test early if you use ‘classical methods’ of testing
• Perform a second ‘validation’ test repeating your original test to check that the effect is real
Related
A post to capture Git related goodness. The secret sauce? Branch early and often but merge and kill branches too.
Updated: 2014-10-03
Fragments
Basics
Create a directory with your project
e.g. ~\project
To create a Git repro. If you want a bare
repro (i.e. because you're going to push to it from somewhere else) use git --bare init
.
To add a file(s) to the repro or to stage a file (staged files are like a snapshot of what you're going to commit before you actually commit (commit to committing!). This means you can stage files as you work on them and then commit an atomic piece of work.
Git will flag untracked files. If you want to ignore files, add them to a .gitignore
file. For example:
Will ignore anything in the \output subdirectory. You might want to ignore *.jpg for example, but binary files and Git is a whole other story (search for GitMedia for example).
To see what's what.
git commit -m "Commit message"
Commit whatever is staged to the repro.
remove a file (from repro) and local directory
Get useful stuff from git like a log...
...loads of options like for example
which shows the diffs (-p) for the last 2 (-2) commits.
What a GUI? use:
Remote Repositories
Show remote repositories
git remote show <remotename>
extra info about remotename
git remote add <remotename> <URL>
Connect to a remote
Grab everything from a remote but won't merge
Works if you have a branch setup to track a remote branch (although in my tests it just worked??!?)
got
git push <remotename> <branchname>
Push your changes back to the remote - but beware if someone else has also pushed!
Creates a local repro in a new directory with remotename=origin
git remote rm <remotename>
remove remote
note: you can use UNC paths on windows, but use forward slahes, i.e.
git remote add origin //home-server/git/Crowmarsh/Crowmarsh.git
Tagging
git tag -a v0.1 -m "Tag: version 0.1 alpha"
Tag with annotation (v0.1)
then:
Important: git won't push tags to remote servers automagically. In that sense they're like branches, so you do:
git push <remotename> <tag>
e.g. git push origin v0.1
will push all tags...!
Branching
Branches are lightweight. Effectively they are a pointer to the git object that represents the commit.
Creates a new branch pointing to the last commit.
git checkout <branchname>
Switches to the branch. Switches is a significant piece of terminology; think of it as switching rather than checking out.
Details of branches....
Basic Merging
First, checkout the branch you want to merge into, then:
If you get something like
CONFLICT (content): Merge conflict in readme.txt
then you'll need to open the file(s) and edit to resolve the conflicts.
Then commit as normal...
Note that merge does a 3-way merge between the two latest branch snapshots and the most recent common ancestor of the two.
git commit -a -m "Message"
Mark has a useful blog post covering why you should
rather than
Remote Branches
There's a branch on your remote repros to, typically initially this is origin/master (i.e. the origin of your remote and the master (or HEAD) branch). If you do work locally and then push to the remote all is fine. Except that is, if someone else has pushed to the remote in the meantime so that origin/master is now ahead. To you need to fetch...
You can checkout a local tracking branch that git knows is related to a remote branch, and so automagically pushes changes to the right place without you having to tell it.
git checkout --track <remote>/<branch>
If you have a local repo and want to push to a (bare) remote repo, use
git remote add <name> <url>
git push --set-upstream -u <name> <branch>
Rebasing
There are two ways to integrate changes from one branch to another. merge (see above) and rebasing. Instead of the 3-way merging, rebasing takes all the changes committed on one branch and replays them (rebases) on another.
Switch to branch then
to rebase from master onto the branch you checked out.
Merging
Merging from a remote branch (i.e. two developers have pushed changes to the server in the same branch).
git fetch
git rebase origin/master
See http://stackoverflow.com/questions/7200614/how-to-merge-remote-master-to-local-branch
Tracking
Tracking branches means Git is aware of the relationship between a local branch and a remote branch, for example, when tracked git status
automagically tells you whether you're ahead or behind the remote branch.
git checkout --track origin/development
Will checkout a remote branch (origin/development) locally and set the relationship between them. Alternatively, you can set a tracking relationship when pushing your local branch. i.e.
git push -u origin development
git branch -vv
Provides details of tracked relationships known to Git.
Misc
Fix a detached head:
From http://stackoverflow.com/questions/10228760/fix-a-git-detached-head
Overwrite local files
If you want to overwrite local, uncommitted, files you can use:
git fetch --all
git reset --hard origin/master
(From Stackoverflow)
Reference material
A fragment to execute a process on Windows:
from subprocess import Popen
proc = Popen("vi \"" + filename + "\"", shell=True )
print(proc)
Which opens filename
with vi
(whatever that might be on a Windows box). Note that while Python is pretty good from a crossplatform perspective, it's not the best when it comes to executing other system processes. So, on other platforms, use with care.
A fragment for writing to a file in Python:
f = open(filename,'w')
f.write('Hey, I'm saying Hello file! \n')
f.close()
Note that this will overwrite an existing file. Use open(filename,'a')
to open in append mode. The Python documentation has details.
A fragment for processing arguments passed from the command line:
import sys
#… stuff
myArg = sys.argv[1] # argv[0] is the name of the command
For more sophisticated parsing, StackOverflow has recommendations.
Python's SimpleHTTPServer
is my goto friend for serving HTTP requests locally while developing stuff. Sometimes however you get caught with the browser caching things that you really don't want it to. CTRL+F5 usually works (it's a hardcoded pattern for me by now) but today I stumbled on a gotcha: cached JSON content that a page was pulling in.
The solution for me, was to modify SimpleHTTPServer to set HTTP headers to tell the browser not to cache. Here's the code:
1
2
3
4
5
6
7
8
9
10
11
12
13 | #!/usr/bin/env python
import SimpleHTTPServer
class MyHTTPRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
def end_headers(self):
self.send_my_headers()
SimpleHTTPServer.SimpleHTTPRequestHandler.end_headers(self)
def send_my_headers(self):
self.send_header("Cache-Control", "no-cache, no-store, must-revalidate")
self.send_header("Pragma", "no-cache")
self.send_header("Expires", "0")
if __name__ == '__main__':
SimpleHTTPServer.test(HandlerClass=MyHTTPRequestHandler)
|
Usage: python serveit.py 8080
to serve the local directory on port 8080.
Source: The ever helpful stackoverflow