Thursday, January 19, 2017

Measures for a successful Trump

What falsifiable metric could be used to say Trump was successful by his own and by Republican aims?
Things he claims
1. Better healthcare. Cover Everybody, cost less and have lower deducables
2. More GDP Growth. Obama never had a year of 3% economic growth. "Obama is the first president in modern history not to have a single year of 3 percent growth. If Trump can deliver an average of more than 3% over his 4 years in office I think an impartial observer would agree the economy has done well.
3. A balanced Budget.
4. Infrastructure improvements are a big part of Trumps promise. These are measured here

Carbon emissions I would like to see improve or not get worse but Trump did not campaign on improving. If Carbon emissions increase as predicted Trump is only doing something he has campaigned on doing.

There are many things like this but by picking a small number of things that they claim will improve I want to make a easy to check test.

Trump and the Republican party aim to deliver 3%+ Growth. A healthcare plan that covers more people and reduces deductibles. Improved infrastructure. And a budget position that is improving. If they do not do this by their own terms they have not succeeded.

Immigration and Birthrate

"Let’s talk about the link between immigration and low reproduction rates"
This is a really weird article. It talks about how below replacement birth rates mean the population will decline. Which is true by definition.
Then about how some countries have lots of immigrants. Then it does nothing to link the two. So in spite of asking to talk about the link it doesn't.


I wanted to look first to see if there was a link. As the article does nothing to show there is.
I took a list of countries by their percentage of immigrants
And one of countries by their birthrate
I created this combined dataset of Country, Birthrate and Immigrant % and put it here

The correlation between birthrate and the percentage of immigrants in a country is weak.

> cor(data$FertilityRate, data$ImmigrantPer)
[1] -0.3463663
I am willing to bet you at odds that the correlation between wealth and birth rate and between wealth and % of immigrants is higher. That having money causes immigrants to come to your country and you to have less children. Not that people choose between having a child and a 25 year old Ethiopian.

So Irish Times please do talk about what is at best a weak link between immigration and low reproductive rates.

Wednesday, January 18, 2017

Brexit 12 objectives

These are the 12 objectives for Britain’s Brexit negotiations, as set out in prime minister Theresa May

Issues Brexiters really care about and will likely get
2. Control of our own laws
5. Control of immigration. Net migration seems to now be about 300k to the UK each year. The Tories promised to bring it below 100K. If immigration drops below 100k that probably means the people who voted to leave the EU have the immigration control they want.

Things that are not measurable
1. Certainty wherever possible
11. Co-operation on crime, terrorism and foreign affairs
12. A phased approach, delivering a smooth, orderly Brexit

Things they had before Brexit
4. Maintaining the common travel area with Ireland
6. Rights for EU nationals in Britain and British nationals in the EU
8. Free trade with European markets

Measurable things (I think they won't get)
3. Strengthening the United Kingdom
7. Enhancing rights for workers
9. New trade agreements with other countries. This probably breaks down to improved economy. So measures of the economic trade could be used to measure this one.
10. A leading role in science and innovation

I am willing to pick measurable metrics on these last four. % of people in Scotland who want independance. Where UK stands in global metrics of workers rights. Patents or and journal paper outputs and their are other metrics of countries innovation. University league tables are another possible metric for example.

Trade agreements are mainly about the economy. Inflation, consumer debt, Sterlings value, GDP growth, export growth are all useful metrics.


I can't think of an obvious metric that shows making your own laws in Parliament has been a good idea. But there are 8 other objectives May wants that are measurable. And general economic metrics most people accept as important.


With at least 10 things to measure to decide if Brexit is going well or badly I think it is reasonable for Leavers and Remainers to define what they would see as success for Brexit. This wont take into account big downside economic or military risks. Or peoples happiness at increased national sovereignty, though national happiness metrics might work.
But you can measure some things people say are important so why not define metrics of what would mean Brexit was a success?

Friday, January 13, 2017

Irish Election Spending 2016

In the Irish election 2016 who paid the most for each vote and for each seat?
8394832.89 total spending (report here) Electorate: 3305110 so €2.50 was spent on each vote. That is under half what is spend on a US presidential vote.
On a per seat and per vote basis

And on a Per Seat Basis


Party,"Votes,1st pref.",Seats,Spending
Fine Gael,544140,50,2768881.50
Fianna Fáil,519356,44,1687916.29
Sinn Féin,295319,23,650190.38
Labour Party,140898,7,1083718.38
AAA–PBP,84168,6,266942.48
Ind 4 Change,31365,4,51669.18
Social Dem,64094,3,190586.93
Green Party,57999,2,146792.27
and the r package code is

data <-  read.csv("spending.csv", header=TRUE)
datat <- mutate(data, perV = Spending/Votes.1st.pref., perS= Spending/Seats)

q<-  ggplot(data=datat, aes(x=Party, y=perV, fill=Party)) + geom_bar(stat="identity") +      scale_fill_manual(values=c("#E5E500", "#66BB66", "#6699FF", "#99CC33", "#FFC0CB","#CC0000", "#008800", "#752F8B"))
q <-q + theme(axis.text.x = element_text(angle = 90, hjust = 1))
q <-q + theme(legend.position="none")
q <-q + labs(title = "General Election Spending 2016")
q <-q + labs(y = "Euros Per Vote")

q<-  ggplot(data=datat, aes(x=Party, y=perS, fill=Party)) + geom_bar(stat="identity") +      scale_fill_manual(values=c("#E5E500", "#66BB66", "#6699FF", "#99CC33", "#FFC0CB","#CC0000", "#008800", "#752F8B"))
q <-q + theme(axis.text.x = element_text(angle = 90, hjust = 1))
q <-q + theme(legend.position="none")
q <-q + labs(title = "General Election Spending 2016")
q <-q + labs(y = "Euros Per Seat")




Wednesday, June 01, 2016

The Name of the Youngest Ever Modern Olympics Gold Medal Winner is Unknown

In the 1900 Olympics the Dutch rowing team were short a cox. They used a rower in the semifinal, Hermanus Brockmann, but decided his 60kg weight was too much of a handicap.

So the rowers, Françoise Brandt and Roelof Klein, picked a ten year old French boy (25kg) out of the crowd and asked him to cox for them.

They won the gold. And took a photo with the boy. But his identity has never been established.

Thursday, May 19, 2016

Dying at Work in the US

Dataset from the Occupational Safety & Health Administration, OHSA, track workplace fatalities in the US. They have CSVs records of the workplace deaths a year in the US, that they release publicly.

The data contains the date, location and a description for 4000 fatalities over five years. I created columns for state, zipcode, number of people and cause.

The most common interesting words in these descriptions are

  • 813 fell
  • 708 struck
  • 642 truck
  • 452 falling
  • 382 crushed
  • 352 head
  • 263 roof
  • 261 tree
  • 258 electrocuted
  • 244 ladder
  • 238 vehicle
  • 226 trailer
  • 197 machine
  • 186 collapsed
  • 180 forklift

Not common but interesting

  • 10 lightning
  • 48 shot
  • 4 dog
  • 2 bees

and here is a map I made of the states where they happen

I have created a repository to try augment the OSHA data and clean it up when errors are found.

The repository is on github here.

If you use it I'll give you edit rights and you can help improve it

Sunday, May 15, 2016

Handpicked by amazon

Whenever I check some product on Amazon for the next few days I get the product in the advertisements on Facebook

Handpicked?

Why would Amazon lie like this?

Thursday, April 21, 2016

Can you Judge a Book by its Cover?

"they've all got the same covers, and I thought they were all o' one sample, as you may say. But it seems one mustn't judge by th' outside. This is a puzzlin' world." The Mill on the Floss by George Eliot
What is the correlation between peoples ratings of a books cover and the ratings the book receives? This post is about a game devised to get people to rate book covers and gives some great visualisations comparing a books goodreads rating to its cover rating. They gathered over 3 million ratings of 100 covers.

I took their data and got the average rating for each of the covers they tested. I then scraped these 100 books Goodreads average ratings, number of ratings and number of reviews. The Data table and the code I used to scrape and aggregate is here. There are all sorts of accuracy warnings you can imagine around these results. The main ones being that the books and their covers all look pretty good to me. They are not on the self published fan fiction end of the market. The variables here are. num_ratings: Number of Goodreads ratings. rating: average rating of the book. num_reviews: Number of people who have actually written a review. cover_rating: The average rating people gave the cover of the book.

> cor(rating,cover_rating)

[1] 0.1609114

> cor(num_ratings,num_reviews)

[1] 0.9597442

> cor(rating,num_ratings)

[1] 0.2141307

> cor(rating,num_reviews)

[1] 0.2658916

> cor(num_ratings,cover_rating)

[1] 0.3059627

> cor(num_reviews,cover_rating)

[1] 0.3307553

So no you can't judge a book by its cover the correlation in ratings is only .16. You can guess the number of ratings by the number of reviews. You can't guess how highly rated a book is by the number of ratings. Having a good cover might increase the number of reviews your book gets by a bit.

The conclusion is you shouldn't judge a book by its cover. Or by its number of sales (ratings). But people probably do judge books by their cover a bit.