Thursday, June 26, 2014

The Great Stagnation, Football Ball Edition

'Since 2002, panel numbers have roughly halved every four years: 32 in 2002, 14 in 2006 and eight in 2010. Thus, by the 2022 World Cup, players should be kicking a single-panel ball around the pitch.' claimed Ken Bray here. I will call this Brays law 'The number of panels on the World Cup football ball will half every tournament'. This is not quite as epoch defining as Moore's law but still cool I think.

This year according to projections the soccer ball in the world cup should have at most 4 panels. Instead the ball has 6. 50% more panels then you would expect if progress continued at the rate Bray predicted. The Great Stagnation is the belief that things are not improving as quickly as they used to and is used to explain why we still have homeless people but not flying cars.

The number of panels each world cup ball has is found on each balls individual Wikipedia page.

2014 the Adidas Brazuca: 'The ball has been made of six polyurethane panels'

2010 the Adidas Jabulani: 'The ball was constructed consisting of eight (down from 14 in the 2006 World Cup) thermally bonded, three-dimensional panels'

2006 the Adidas Teamgeist:'The Teamgeist ball differs from previous balls in having just 14 curved panels rather than the 32 that have been standard since 1970. Like the 32 panel Roteiro which preceded it'

Fewer panels mean the ball is smoother and should fly more true. The ball flying true involves the interaction of several variables other than the panel number though. The 2010 ball was notorious for wobbly flight for example. For this reason just reducing the number of panels at the expense of the quality of the ball is a bad idea and may explain why Bray's law has failed. Still not being able to make a ball work with fewer panels indicates a technological innovation slow down to me. The aerodynamics of soccer balls and why there is a race to fewer panels is described in Brays article 'A fly walks round a football'.

Wednesday, March 12, 2014

Lets All Move to No Insurance Land

Ryanair have finally set up their own country. In order not to buy insurance you have to select that option from the country of residence option. Because the UX of select insurance clearly is to choose from a country this option makes complete sense.

If you are going to have a new country of 'don't insure me' it is clearly not in some sort of non country section of the drop down but resides just after Denmark.

I've talked before about where Ryanair if people did what they asked they would disappear. But I still like them, I just like pointing out when some company acts oddly.

There is a level of hiding extra charges from people. Setting up your own country to get an extra few quid out of people really shows commitment.

Sunday, January 05, 2014

Goodreads Recruitment Hack

I entered in some of the node.js books I have been reading into my goodreads list. And they mentioned that they were recruiting.
The programmer who told recruiters about github meet the same fate as the police officer at the end of the wicker man. But at the risk of the same thing happening to me this is a really clever idea. If you are looking for people who know about an area checking if they read the books is one way. Only goodreads can advertise on their site but anyone can look up book reviews. The other surprising thing is with 800k followers goodreads twitter mentions must be like looking at the digital rain from the Matrix. But they noticed that I mentioned their recruitment idea and replied.

Thursday, November 14, 2013

Wheat Map of the US

I thought it would be cool to make a map of the US counties by how much wheat they grew. I took the code from this article and from the Visualize Data book by Nathan Yau

I got some wheat data from here the US department of Agriculture. The map of the US comes from here

Then I cleaned up the data by taking only the columns for state, county and total wheat production. This dataset includes a county 888 and 999 but that seems to be a combination of all the states counties so I stripped those out. Also there are more than 50 states in these county datasets which seems to be standard. There is always messing with numbers being seen as strings with these sorts of manipulations so some casting is needed.

The svg is 1.9 mbs and google drive does not want to store or convert it at the moment but if anyone wants it I can send it to them. This quality of file means zooming in on an individual state, like Kansas, is fine.

The code to create this picture is here.

JDLong on twitter pointed out where to get data for countries. I got the grains from here and a look at the 'head psd_grains_pulses.csv' shows the file layout

I think I want Country_code and value for the commodity wheat in every country in the most recent year value. The country code is 2 characters (iso 3166-1 alpha 2) and the map I have from wikipedia is that format you can get it here

The code to produce colors for each country based on this data is here. Again this is based on the "Visualize This" book from Yau. This css code to set the color of each country gets pasted into the style section of the BlankMap-World6.svg file. I should read all the documentation describing the values before doing any analysis like this. But I am only doing this to make pretty pictures in Python so I am making assumptions to work quickly.

extra: I made a stacked area graph of what crops have been grown when here with the code here.

Sunday, December 30, 2012

World Cup 2010 Heatmap

I am reading Visualize This by Yau at the moment. It is full of really pretty visualization ideas and examples. One it has is creating a heatmap of NBA players. To practice this visualization I have made one of World Cup 2010 players. The dataset I got from the Gardian Data blog 'World Cup 2010 statistics: every match and every player in data'. The data only has 5 qualities quantified but that is good enough to practice making heatmaps.
The R Package code I used is below
library(RColorBrewer)
#save the guardian data to world.csv and load it
players2<-read.csv('World.csv', sep=',', header=TRUE)
players2[1:3,]
#players with the same name (like Torres) meant I had to merge surnames and countries
players2$Name <-paste(players2[,1], players2[,2])
rownames(players2) <- players2$Name
###I removed one player by hand
###I now do not need these columns
players2$Position <- NULL
players2$Player.Surname <- NULL
players2$Team <- NULL
players3 <-players2[order(players2$Total.Passes, decreasing=TRUE),]
### or to order by time played
###players3 <-players2[order(players2$Time.Played, decreasing=TRUE),]
players3 <- players3[,1:5]
players4<-players3[1:50,]
players_matrix <-data.matrix(players4)
###change names of columns to make graph readable
colnames(players_matrix )[1] <- "played"
colnames(players_matrix )[2] <- "shots"
colnames(players_matrix )[3] <- "passes"
colnames(players_matrix )[4] <- "tackles"
colnames(players_matrix )[5] <- "saves"
players_heatmap <- heatmap(players_matrix, Rowv=NA, Colv=NA, col = brewer.pal(9, 'Blues'), scale='column', margins=c(5,10), main="World Cup 2010")
dev.print(file="SoccerPassed.jpeg", device=jpeg, width=600)       
#players_heatmap <- heatmap(players_matrix, Rowv=NA, Colv=NA, col = brewer.pal(9, 'Greens'), scale='column', margins=c(5,10), main="World Cup 2010")
#dev.print(file="SoccerPlayed.jpeg", device=jpeg, width=600) 
dev.off()
Nothing very fancy here. Just showing that with a good data source and some online tutorials it is easy enough to knock up a picture in a fairly short time.

Monday, December 24, 2012

The Price Of Guinness

When money's tight and hard to get 
And your horse has also ran, 
When all you have is a heap of debt - 
A PINT OF PLAIN IS YOUR ONLY MAN.
Myles Na Gopaleen

How much has Guinness increased in price over time? Below is a graph of the price changes. The data is taken from a combination of the Guinness price index and CSO data

The R package code for this graph is below.

pint<-read.csv('pintindex.csv', sep=',', header=TRUE)
plot(pint$Year, pint$Euros, type="s", main="Price Pint of Guinness in Euros", xlab="Year", ylab="Price Euros", bty="n", lwd=2)
dev.print(file="Guinness.jpeg", device=jpeg, width=600)       
dev.off() 
Paul in the comments asked a good question. How does this compare to earnings?
        price   Earnings/Price     Earnings per Week (Euro)
 2008   4.22    167.31                 706.03
 2009   4.34    161.69                 701.73
 2010   4.2     165.02                 693.08
 2011   4.15    165.81                 688.11
 2012   4.23    163.56                 691.87
Here the earnings are average weekly earnings which is the modern and slightly different value to average industrial wage which the Pint Index used. It shows that even with a price drop in Guinness the total purchasing power of pints with wages decreased. This is based on gross wages increases in tax probably made the situation based on net wages worse.

Pintindex.csv is

Year,  Euros 1969, 0.2 1973, 0.24 1976, 0.48 1979, 0.7 1983, 1.37 1984, 1.48 1985, 1.52 1986, 1.64 1987, 1.73 1988, 1.8 1989, 1.87 1990, 1.93 1991, 2.02 1992, 2.15 1993, 2.24 1994, 2.34 1995, 2.42 1996, 2.5 1997, 2.52 1998, 2.65 1999, 2.74 2000, 2.88 2001, 3.01 2002, 3.24 2003, 3.41 2004, 3.54 2005, 3.63 2006, 3.74 2007, 4.03 2008, 4.22 2009, 4.34 2010, 4.2 2011, 4.15 2012, 4.23

Wednesday, December 19, 2012

Cystic Fibrosis Improved Screening

In the first post I claimed that like Tay-Sachs in Israel Cystic Fibrosis could be drastically reduced with some relatively inexpensive genetic testing. In the second further analysis suggested that such genetic screening of the Irish population would pay for itself several times over. In this post I want to see if some form of targeted screening could be shown to be as cost effective as currently implemented screening.

Currently there is free screening for people who has relatives with CF and their partners. I assume they include second cousin as a relative. Based on this paper and some consanguinity calculations I calculate that an Irish couple with one of their second cousins has CF have about twice the chance of having a child with CF as the general population. This means you can be tested for free currently if you have about a 1 in 700 chance of having a child with cystic fibrosis whereas the general population with a 1 in 1444 chance. If a test can be focused the test so that it is twice as good as random screening that should be enough by current standards to be rolled out.

How could a non random screening be made this focused?

1. Geographic area. Some areas of the country might be more likely to have CF carriers than others. Targeting screening in these areas might make it twice as effective. The Cystic Fibrosis Registry of Ireland annual report 2010 gives numbers for Irish counties. 4 counties do not have their numbers listed but I have estimated these based on their population.

This map is based on the figures of people with CF found in the registry. This could be a biased sample or people could have moved. A better measure would be babies born with CF in each county.

Number of people with CF in each county might be useful for deciding how to allocate some treatment resources. What % of people have CF is more interesting for screening though. To work this out we first need the numbers found in each county.

The number of people with CF in the registry per ten thousand people is

I can send anyone who wants them full sized versions of these maps or the r package code I used to generate them. The code I used is below

library(RColorBrewer)
library(sp)
con <- url("http://gadm.org/data/rda/IRL_adm1.RData")
close(con)
people<-read.csv('cases.csv', sep=',', header=TRUE)
pops = cut(people$cases,breaks=c(0,2,10,20,30,40,50,70,150,300))
myPalette<-brewer.pal(9,"Purples")
spplot(gadm, "pops", col.regions=myPalette, main="Cystic Fibrosis Cases Per County",
       lwd=.4, col="black")
dev.print(file="CFIrl.jpeg", device=jpeg, width=600)
dev.off()
population<-read.csv('countypopths.csv', sep=',', header=TRUE)
pops = cut(population$population,breaks=c(0,20,40,60,70,80,100,160,400,1300))

myPalette<-brewer.pal(9,"Greens")
spplot(gadm, "pops", col.regions=myPalette, main="Population in thousands",
       lwd=.4, col="black")
dev.print(file="PopIrl.jpeg", device=jpeg, width=600)       
dev.off()

gadm$cfpop <- people$cases/(population$population/10)
cfpop = cut(gadm$cfpop,breaks=c(0,0.5,1,1.5,2,2.5,3,3.5))
gadm$cfpop <- as.factor(cfpop)

myPalette<-brewer.pal(7,"Blues")
spplot(gadm, "cfpop", col.regions=myPalette, main="CF/Population Irish Counties",
       lwd=.4, col="black")
dev.print(file="CFperPopIrl.jpeg", device=jpeg, width=600)       
dev.off() 
If this result was replicated in a more complete analysis just picking the darker counties could get you the two times amplification needed to have a test as strong as the currently paid for ones.

2. Pick certain ethnic minorities. Some groups have higher levels of CF than the average population. For example travellers have higher levels of some disorders. 'disorders, including Phenylketonuria and Cystic fibrosis, that are found in virtually all Irish communities and probably are no more common among Travellers than in the general Irish population. The second are disorders, including Galactosaemia, Glutaric Acidaemia Type I, Hurler’s Syndrome, Fanconi’s Anaemia and Type II/III Osteogenesis Imperfecta, that are found at much higher frequencies in the Traveller community than the general Irish population'. 'There is no proactive screening of the Traveller population no more than there is proactive screening of the non-traveller Irish population'. I do not think deliberate screening of one ethnic group, unless that group themselves organise it, is a good idea. Singling out one ethnic group for screening risks stigmatising its members and reminds many of the horror of eugenics.

3. Certain disorders seem to cluster with CF. 'In 1936, Guido Fanconi published a paper describing a connection between celiac disease, cystic fibrosis of the pancreas, and bronchiectasis'. Ireland also has the highest rate of celiac disease in the world (about 1 in 100). If CF and celiac disease or some other observable characteristic are also correlated in Ireland testing people with celiac disease in their family could also provide amplification of a test.

4. Screening parents undergoing IVF. HARI was the first clinic in Ireland to offer IVF and it currently receives up to 800 enquiries a year specifically about the procedure. It carries out over 1,350 cycles of IVF treatment annually and over 3,500 babies have been born as a result. The Merrion Clinic carries out up to 500 cycles of IVF per year, while last year, SIMS carried out 1,063 cycles." IVf is roughly 33% effective per cycle so this means about 1000 children are born through IVF from these three Irish clinics here each year. Screening of these parents would prevent roughly one CF case per year. Screening people who use IVF does not prevent many cases. It can be used by people who know they are CF carriers to avoid having a child with CF though.

Concerns about the privacy and security of a general genetic screening program of the Irish population should not be ignored. Cathal Garvey on twitter pointed out that this screening would require 'With explicit informed consent & ensuing destruction of samples, Just wary of prior shenanigans of HSE bloodspot program. i.e. it's already fashionable among governments to abuse screening programs to create 'law' enforcement databases. Without clear guarantees against that, must weigh the costs of mass DNA false incriminations vs. gains of ntnl screening prog!' I agree that any genetic screening program for Ireland would have to ensure privacy for the individual.

Screening the general population for carriers of serious genetic disorders would save money and suffering. If the level of savings are not sufficient for general screening focusing on certain locations or relatives of people who suffer from disorders that co-occur with CF could amplify the returns sufficiently to be as useful as current screenings.