Thursday, December 9, 2010

A Lesson in Citizen Science

I came across an article that provides an excellent example of what citizen science can do. The blog was Cosmic Variance and the article was Science for the masses. The article describes a project called GREAT 2010, which is attempting to improve analysis methods that are used to detect dark matter in the universe. These methods work by detecting how light from far away galaxies or stars are pushed and pulled by dark matter near their path. However, there are many factors that make this analysis harder. The atmosphere distorts the images we see, and the galaxies and stars are never quite exact shapes, so you must correlate several images in order to get a clear picture.

What is unique about this project, and what we can apply to citizen science (in particular for Climate Science), is that this project is not only providing a venue for citizen scientists to contribute, this project is encouraging contribution from those outside the field who have relevant skills (namely programming image processing software)! This isn't just asking backyard astronomers for their photos, it is asking skilled (but lay) people to look at a difficult problem and give them their insight. What this project is doing is saying there are people outside the field who have insight and skills that can be applied to these problems. I think science (in general) needs a bit more of this type of project and climate science is no exception.

The other interesting aspect of this project is how it is judging the various methods that will be submitted. They are inventing a "fake" data set that they feed out to the public for this contest. When teams submit their methods with the results they get, these results are compared to the model used to create this data set. What is truly interesting when compared to climate science is how a "fake" data set can speak directly to a standard skeptic talking point. That is, by using a "fake" data set, they don't test the theory that is being built upon (AGW or some aspect of it), but they do test the analysis method used to extract the results.

I'm looking forward to projects like these in climate science and I am optimistic about their ability to help the science.

Tuesday, October 12, 2010

Gridding GoGCM

I have added a new model (model 003) in the examples code. This model is an addition to the last tutorial model (the "glass slab" model). This model is to demonstrate how I see some of the gridding techniques for GoGCM. I'm hoping this will spark some discussion on how to pass the information around in a more efficient manner and I'll also try to engage the GoNuts google group for some feedback.

The model is quite a simple model at this point, and there are at least two things that I want to fix in the relatively short term. The radiative model is essentially what was described in the tutorial, with two layers for the atmosphere and the atmosphere and surface radiating in proportion to the temperature according to the Stefan-Boltzmann law. I wanted to transfer some energy from grid point to grid point so I added a simple conduction type transfer (I know this is not physical, but for now it will suffice). I have also added a heat capacity for each grid point to enable an implementation of the daily cycle.

The next step for this model is to... well... actually run it. Yes, for right now I haven't set-up all the grid points and run a few test cases. What I've done so far has been to create the basic set-up so that I can start creating these cases. The next steps will be to initiate the data structures and trying a few test cases (I'll try to get this done over the next few days).

Also in the near term, I want to add a daily cycle to this model (possibly even seasonal variations). For that, I'll have to see how the initial tests go in the next few days.

Let me know if you have suggestions or comments on the code so far. I'm hoping that by cleaning this up a bit it could serve as the basis for GoGCM (barring some other decisions about architecture that I'll post about another time). At the very least, I hope others find this interesting.

Tuesday, September 28, 2010

A "Glass Slab" Model

Here is the second tutorial as provided by Barton Paul Levenson. Again, I'll present the tutorial, followed by an explanation of the code added to the github repository. I'll continue to use these simple models to demonstrate some of the key concepts we'll be using to build GoGCM (as I did with the sharing post).

Here's the tutorial.

A "Glass Slab" Model
Written by Barton Paul Levenson

Now we add an atmosphere to create a "glass slab" model. Model 1 just had two factors affecting the Earth's surface temperature: sunlight and reflection. Model 2 adds IR back-radiation.

The Earth's albedo is assumed to apply to the very top of the atmosphere, giving F = 237 W m-2 from the 342 W m-2 which actually comes from the sun.

The surface absorbs all sunlight and all IR that falls on it. The atmosphere absorbs all IR, but no sunlight. Energy balances for atmosphere and surface, respectively, are:

Atmosphere: \begin{equation}F_s=2F_a\end{equation} Surface: \begin{equation}F+F_a=F_s\end{equation}
Here, F is the incoming sunlight, as defined in model 1 from the Earth's albedo and solar constant. Fa is the amount radiated from the atmosphere--since it has both a top and a bottom, it radiates Fa in each direction. Fs is the amount radiated from the Earth, which only goes up. The atmosphere has Fs from the ground as its only energy input; its output is 2 Fa. The Earth gets both F from the sun and Fa from the atmosphere; its output is Fs.

Assuming perfect emissivity, we can substitute in the Stefan-Boltzmann law to rephrase these equations in terms of temperature:

\begin{equation}\sigma T_s^4=2\sigma T_a^4\end{equation} \begin{equation}F+\sigma T_a^4=\sigma T_s^4\end{equation}
Dividing through all terms by σ to simplify, we get

\begin{equation}T_s^4=2T_a^4\end{equation} \begin{equation}\frac{F}{\sigma}+T_a^4=T_s^4\end{equation}
Ts4 occurs on the left-hand side (LHS) of the top equation and the right-hand side (RHS) of the second. We can therefore eliminate Ts between the two equations and find that

\begin{equation}T_{a}=\left(\frac{F}{\sigma}\right)^{0.25}\end{equation}
And, with more algebra:

\begin{equation}T_{s}=2^{0.25}T_a\end{equation}
yielding Ta = 254 K. Do you recognize the Earth's radiative equilibrium temperature, Te? And Ts = 303 K. The latter figure is 15 K over the actual figure--5% too high--but that's closer than last time.

But Earth's atmosphere isn't equivalent to one blackbody layer in IR-absorbing ability. It's closer to two. We would then have, with layer 1 on top and layer 2 on bottom, and the ground underneath both, T1 = 254 K, T2 = 303 K, and Ts = 335 K, which is too high by 47 K, or 16%. This is worse than model 1! Clearly we're leaving out something important.

Go Code
Arrow

I've added the code to the github repository, under the examples directory as model 002. I've reused the code from model 001, including the improvements as described in the sharing post. I'll keep working on model 002. Let me know if you'd like to see anything in particular.

Friday, September 24, 2010

Everyone loves sharing

In the last post, the tutorial that Barton Paul Levenson wrote described a simple radiative model of the earth with no atmosphere. In this post, I will modify the code that I wrote for this model and show some of the power and expressiveness of Go (hopefully).


Before I begin with the code, I wanted to point out a video and a paper that I think shows some of the promise of Go. The video is of Rob Pike describing how channels and goroutines allows a programmer to write exactly what he wants to do, and no more.


The paper, mentioned by Pike in the video, is called "Squinting at Power Series". This paper describes how to use channels and concurrency to represent power series. This representation is inherent in the concurrency and allows various operators to be defined extremely quickly. In particular, defining differentiation and integration are almost trivial when representing power series in this manner.


One of the interesting aspects of this video and paper, the video was recorded as Rob Pike was just starting to work on Go (or maybe just before) and the paper was written several years before that.


Ok, now back to GoGCM. At the end of the last post, I wrote Model_001. This code had two functions,

func solar(ch chan int) (out chan datapoint) and
func gcm(ch chan datapoint) (out chan datapoint)


Each of these functions started a goroutine, which communicated to the main program through two channels. The input (passed as ch in the parameters) and the output channel (created by the function and returned to the caller). All these channels pass information by value, meaning a copy is made to give to the function or return from the function. This is fine for this small toy example (where there are 4 values in a struct), however as the examples will become more and more involved, we will want to be able to store information in one location and refer a function to that location in order to reduce the overhead.


I'm going to introduce you to a channel of pointers. To understand this, you must first understand what channels are and how they behave. A much better place to learn about channels is the Go resource sites linked on the side of this blog, but I will try to explain some of the basics. If you have any questions, don't hesitate to ask them in the comments.


A channel is a way to pass information from one goroutine (lightweight thread of execution) to the next. A channel is like any other variable, you can assign it, allocate it, but the major difference is that you can also send or receive from it. If I create a channel like so


ch := make(chan int)


then I send a value to it


ch <- int(1)


another goroutine can receive that value from the same channel like this


in := <-ch
fmt.Printf(in) // Prints 1


Channels can pass more than just values however, and can pass user created structures as well such as datapoint in model_001, pointers or even other channels (channel of channels!). What we want to do is pass pointers to the datapoint structure, since this is where all the data of our model is located. The functions are now redefined like so


func solar(ch chan *datapoint) (out chan *datapoint) and
func gcm(ch chan *datapoint) (out chan *datapoint)


Now I hear some are already groaning, because passing pointers between  executing concurrent code is dangerous without mutexes, semaphores and all kinds of concurrency contructs and an expert who can code all of the above with a sufficient artistic touch! Using idiomatic Go, it's just not necessary! Go has a very important slogan that describes how idiomatic Go deals with these issues. (see the Effective Go page on the website www.golang.org).


"Do not communicate by sharing memory; instead, share memory by communicating"


The go blog describes this in the post "Share Memory by Communicating". The principle is that when you pass a pointer over a channel, the receiving goroutine becomes the owner of that resource. This lets you inspect your code quickly and see when you are the owner and when it's owned by another goroutine.


I have implemented this idea in model_001, so that a single datapoint variable is passed to func solar(...) and to func gcm(...). This structure of programming is very versatile. To demonstrate this, lets say that we know what the temperature was on our model world, and we also know that the only way it varied was by the variation of the solar output. By adding a few extra lines of code in each function, we can pass a blank datapoint to gcm, which calculates the temperature and the required F. This datapoint is then passed to solar, which calculates the solar constant required (S).


In fact, you can even pass datapoint to solar or gcm in any order (even randomly). I have coded a solar driven and a temperature driven function, and even demonstrated a hybrid driven model.


The power of goroutines to express how to assemble a model is one of the great strengths of Go. I hope this examples demonstrates this. Let me know what you think, or if you see a way to improve this example.

Thursday, September 16, 2010

Zero-Dimensional Energy-Balance Model

As many of you have probably noticed (at least if I judge from my gradually reducing traffic over the last few days) I have been a little lighter on number of blog posts than the torrent pace that I started at over the weekend. While this is partly that I have a job during the day and will likely do more posting on the weekend, I have also been spending a bit of time studying the GISS ModelE code and planning the next steps for GoGCM.

While I'll continue for the next week or two with this studying and planning, Barton Paul Levenson (regular commenter at RealClimate) has graciously passed me a tutorial that he's worked on himself while trying to understand radiative/convective models. I've added these models to the github repository for GoGCM. At the end of each tutorial, I'll describe the example(s) that I've put on the  code repository.

Zero-Dimensional Energy-Balance Model
Written by Barton Paul Levenson

We begin with the simplest possible model. The sunlight absorbed by the climate system is:

F = (S / 4) (1 - A)                                                (Eq. 1)

where F is flux density in watts per square meter, S the solar constant in the same units, and A the Earth's bolometric Russell-Bond spherical albedo. The factor of 1/4 is there because the Earth absorbs sunlight on its cross-sectional area, π R2, but has a total surface area four times as great, 4 π R2.

We find the Earth's radiative equilibrium temperature--also called "emission temperature" and "effective temperature"--by assuming perfect emissivity and inverting the Stefan-Boltzmann law:

Te = (F / σ)0.25                                                  (Eq. 2)

In the SI, the Stefan-Boltzmann constant σ (sigma) has the value 5.6704 x 10-8 W m-2 K-4. The 1951-2000 mean of Judith Lean's TSI reconstruction is S = 1366.1 watts per square meter, and NASA gives A = 0.306 for the Earth. That means F = 237 W m-2 and Te = 254 K. Water freezes at 273 K, so this would leave the Earth frozen over. Earth's actual surface temperature is Ts = 288 K, 34 K higher than the emission temperature. The difference is due to the greenhouse effect. If we took Ts = Te, we'd be too low by 12%. We need to improve our model.

Go Code
Arrow

I've added an examples directory with model 001. This model is the go code of the zero-dimension model described by tutorial here. While I used all the same values, I changed the solar constant for each time step (using a random value) in order to have a time series as output that varies. With the value set in the code, I had a temperature variation of approximately 1 deg K. This of course will change with each run since it depends on the random values. I encourage everyone to download the source code and try it out. Let me know what you think.

Once again, thanks to Barton for letting me use the tutorial for these series of posts.

Sunday, September 12, 2010

"Re" Introducing GoGCM

I've created the github repository for GoGCM. The "model" is currently just a place holder. It is a mean (configurable but set to 17.5 by default) that has a random value added to it (standard deviation of 1 and mean of 0). The point is more about the process of installing it and running it in order to get information from it. In order to install and run this, follow these instructions.
  1. Install and build the go repository by following the instructions on the go website. For those using windows, you will likely need to install it from here.

  2. I believe you need a github account for the next step. Go to the website and create an account.

  3. Go has a very handy tool called goinstall. Open a console and enter the following commands:
  4. goinstall github.com/Arrow/GoGCM
    cd go/src/pkg/github.com/Arrow/GoGCM
    gomake

  5. These commands will download the code from the github repository and build the GoGCM code. You can now run the code with
  6. ./gcm

  7. This will output a list from 0 to 100 with the model output in a csv format (to stdout). You can do the usual piping and send it to a text file like so
  8. ./gcm > model.csv

That is the basics of installing the (currently very simple) GoGCM. If you run into any problems just let me know or check the go website and the go-nuts group to see if others have had the same issues.

Evaluating Success

As I had stated in my last post, I intend to fill in the blanks for how this project will proceed.


The first issue to settle, is how the design will be decided. This blog will serve as the point of discussion for all the technical issues. My intent is to evaluate each design decision point by point in this forum (I'm hopeful other's will join me in leading these discussions).


Once each decision is made, the code will be updated in the project. I will be setting up a github repository for the code later this week, along with an initial simple model to demonstrate how the code will be set-up.


The next issue to settle is how the project will be evaluated. I considered trying to implement a direct copy of another GCM (for example GISS's Model E). It would be a much easier undertaking and the project would likely create a running GCM much quicker. I decided against this path for the reason that less would be learnt by doing this, as all the decisions would be to follow a give model. As such, these decisions will consider how other models do this but will not necessarily follow a given model (for the moment, I'll be especially focusing on the GISS Model E and NCAR CESM). 


As an example of the types of decisions to be taken, consider the grid selection of the surface. GISS uses a Cartesian grid (4deg X 5deg and 2deg X2.5deg). While this is a more straightforward grid, it also leads to a bunching of the grid boxes at the poles. There are other grids that could be used and it is one of the issues that I intend to bring up on this blog.


Since GoGCM will not be based off a given model, it does beg the question as to how the model will be evaluated. I want to select a model that will provide a reference for comparison for the work being done on GoGCM. The reference will be GISS's models. Starting with trying to replicate GISS Model II (or EdGCM), this will allow GoGCM to evaluate it's performance and what it's design decisions are achieving. I want to know what others think of this suggestion. Please comment on what you think of this decision.


Finally, you may ask what I hope to achieve with this project? My hope is that this project will demystify GCM's. I want to understand how they function and what type of information they can give us. By understanding this, we can also help to evaluate their results. As I have seen Gavin comment at realclimate, there is a lot of data that is output from GCM's. By better understanding GCM's, maybe this output can be more thoroughly analysed.


Finally, it is my hope that this project will lead to better GCM's. I think there is a long way to go from this point, however having more minds looking at these issues is bound to generate a host of ideas about how to improve current GCM's. I would feel extremely proud if even one of these ideas leads to an improvement in any of the current GCM's or the analysis of their results.

Saturday, September 11, 2010

Introducing Go

I think that for many of those from the climate change blogs, Go is a new language that will be needed for this project. In this post, I will introduce the basic concepts of the language. I will point out however that the best place to learn more about this language is through their website (in particular the tutorial and the effective go page). Also some of the blogs on the blog roll have some particularly good articles about Go, especially Andrew Gerrand's blog.

Go was developed at Google and released as an open source development in November 2009. The designers (Robert Greisemer, Rob Pike and Ken Thompson) have quite a pedigree and many were excited just by the fact of who designed the language. Since then, a strong community has grown around Go and many people are finding that it is quite powerful and expressive.

Some of the basic features are that it doesn't have classes like C++, but you can create methods attached to c like structs. This allows some of the functionality of objects, without the complex type heirarchies. In the same line of thinking, there is no inheritance in Go. There is an interface declaration however, that allows similar constructs to be built and yet be quite simple to use as well.

Another feature of Go is it's Concurrency constructs. It has a keyword to launch a small lightweight thread of execution. This is called a goroutine and it allows a parallel thread of execution (see the go site for details of it's implementation). Go also has a channel type in order to pass information between various goroutines.

The interesting part about goroutines and channels, is that it allows a programmer to define very concisely the work needed to be done by any given goroutine, and also defining it's interaction with other goroutines through the channels. The real strength of these concurrency constructs is they provide a manner to logically describe the work that needs to be done. I'll try to add some examples applied to some GCM calculations and how these constructs can link together to be very powerful.

There are a few aspects of go that could be troublesome for this project however. Go is a Garbage Collected language. Generally I think this is good, especially in a concurrent language like Go, however there are some issues related to high performance calculations where it could be a nuissance. It will be something to pay attention to and I'm hoping to get some help from the current Go community to tackle these issues.

Another issue (right now) is that go's support for windows is not complete. While there has been some great strides, the best bet to run Go is to run it on Linux or Mac. This is being actively worked on by the Go community and I'd expect it to improve rather quickly. I'm quite hopeful that by the time there is any meaningful coding happening on GoGCM, the windows support will be a non issue.

I'm certain there will be other issues as the project moves forward, however I truly believe that the benefits of Go outweigh the issues. I'm looking forward to learning more about Go as others get their feet wet and take on the various challenges for this project. I encourage those who want to learn go to look at the documentation on their website and to engage the go community. I think you will find it well worth the effort.


My next post will be in the next day or two and will discuss the strategy of GoGCM. How will the design be fleshed out, how it will be compared to other GCM's (will it be baselined to an existing model) and what will it try to achieve.

Friday, September 10, 2010

Introducing GoGCM

I have been interested by two subjects recently. Climate Change and the Go Programming Language. This blog is a combination of these two interests. I want to start a project to create an open source General Circulation Model (GCM) in Go. The obvious questions are, why a GCM and why Go?

Why a GCM?

There has been much interest in "citizen science" of late in the climate change blogosphere. There have been many global temperature reconstructions over the past several months. I think efforts like this are extremely valuable and that more should be done, especially in order to improve our understanding of (or increase our confidence in) an important issue such as climate change.

In this light, I want to start a project in order to create an open source GCM. This blog will have a series of posts that discuss various aspects of GCM's and how to implement it in GoGCM. I think GCM's are often misunderstood and these posts (while trying to design this GCM) will shed some light on how GCM's work, and what they can/can't do.

While the main audience of this blog will be "citizen scientists" (eg. Engineers, Programmers and other technical persons interested in Climate Change), I am hoping that some of the Climate Experts will contribute in some form (commenting or guest posts) from time to time.

Why Go?

Since it was released in November 2009, I have been following the development of the Go Programming Language. My first reaction was, if you take a few interesting languages and strip the interesting parts out, you get Go. Since then, I've been looking more and more at this language, and I have been extremely impressed. While the feature set seems basic on first inspection, it is quite expressive. Andrew Gerrand had two posts (part 1 and part 2) that demonstrates this nicely.

I think the Go Programming Language represents an important innovation in programming. In particular, it's ability to reduce bookkeeping and focus on the important parts of code are remarkable. While many in the "citizen science" climate community have been relying on R for the analysis of the temperature reconstructions, I believe a project like this requires something more substantial.

The other option would be using directly the source code provided by current GCM's (GISS and NCAR for example). While this option likely has a better chance of success (a GCM that actually runs), developing GoGCM will force all aspects of the GCM to be investigated. This means that the GCM's will be understood more by the community that I hope this blog fosters.

The final reason I chose Go, is that I am optimistic about the Go Programming Language. I believe it could be an important general programming language in the future. By trying to create GoGCM (especially with a vocal "citizen science" community), I'm hoping to attract dozens of new Go programmers, particularly those who are interested in scientific programming like I am.

Going Forward

I'll be laying out some of basics rules for how this site will work and try to start some technical threads for the basic design decisions. As well, I'll be setting up project page (likely github) in which I will post the code. I look forward to any feedback.