Sunday, September 12, 2010

Evaluating Success

As I had stated in my last post, I intend to fill in the blanks for how this project will proceed.

The first issue to settle, is how the design will be decided. This blog will serve as the point of discussion for all the technical issues. My intent is to evaluate each design decision point by point in this forum (I'm hopeful other's will join me in leading these discussions).

Once each decision is made, the code will be updated in the project. I will be setting up a github repository for the code later this week, along with an initial simple model to demonstrate how the code will be set-up.

The next issue to settle is how the project will be evaluated. I considered trying to implement a direct copy of another GCM (for example GISS's Model E). It would be a much easier undertaking and the project would likely create a running GCM much quicker. I decided against this path for the reason that less would be learnt by doing this, as all the decisions would be to follow a give model. As such, these decisions will consider how other models do this but will not necessarily follow a given model (for the moment, I'll be especially focusing on the GISS Model E and NCAR CESM). 

As an example of the types of decisions to be taken, consider the grid selection of the surface. GISS uses a Cartesian grid (4deg X 5deg and 2deg X2.5deg). While this is a more straightforward grid, it also leads to a bunching of the grid boxes at the poles. There are other grids that could be used and it is one of the issues that I intend to bring up on this blog.

Since GoGCM will not be based off a given model, it does beg the question as to how the model will be evaluated. I want to select a model that will provide a reference for comparison for the work being done on GoGCM. The reference will be GISS's models. Starting with trying to replicate GISS Model II (or EdGCM), this will allow GoGCM to evaluate it's performance and what it's design decisions are achieving. I want to know what others think of this suggestion. Please comment on what you think of this decision.

Finally, you may ask what I hope to achieve with this project? My hope is that this project will demystify GCM's. I want to understand how they function and what type of information they can give us. By understanding this, we can also help to evaluate their results. As I have seen Gavin comment at realclimate, there is a lot of data that is output from GCM's. By better understanding GCM's, maybe this output can be more thoroughly analysed.

Finally, it is my hope that this project will lead to better GCM's. I think there is a long way to go from this point, however having more minds looking at these issues is bound to generate a host of ideas about how to improve current GCM's. I would feel extremely proud if even one of these ideas leads to an improvement in any of the current GCM's or the analysis of their results.


  1. Hmm. At least Go isn't post-1977 fortran, which means tying yourself to a dreadful codebase forever.

    The first step is to decide whether to create your own dynamic core.

    Reasons to create your own dynamic core: well, you chose Go because of its parallelism presumably, and this is the parallel part of a conventional GCM. Plus it's a beautiful problem, and there is a standard benchmark developed by Isaac Held so you will be able to know when you've solved it.

    Reasons not to create your own dynamic core: It is a solved problem. It really isn't climate at all. If it attracts a community it will be of computational science nerds and not of climate nerds, and will provide a lousy launchpad for the rest of the project.

    It seems to me that if you build your own dycore you will never get to do any climatology, and if you don't, the advantages of Go over Python go away.

    When I have the fantasy you are talking about here I first decide how to avoid rebuilding a dynamic core, since this part is the only part that is actually provable. I am torn between the MITGCM core and a PetSC build. Normally I just assume I am going to do the rest of it in Python and MPI because there's such a thing as too many adventures.

    Then I decide it's too much work and not enough glory (or meaningful contribution) and move on to the next daydream. But if I were to do it, the next step would be to get an F77 model and strip out the COMMON blocks and then splinter it into testable chunks.

  2. Thanks for the comment Michael. I'm happy to say you're the first comment on this project. I hope you continue to visit the site and comment.

    You are correct in that it will be faster to implement a dynamic core that already exists, however my intent isn't just to get something running, it's to understand the physics behind the models and clarify it for the technical public. While the dynamic core isn't the cutting edge of climate science, it does represent the basic physics, which is what I specifically intend to examine.

    The other aspect to mention, this project is also partly about Go. I'm very excited by this new language, however at the moment there isn't much interest in scientific computing. I think this is a shame and part of this project is intending to attract more scientists to the language.

    While Go is able to do parallel programming quite nicely, it is the combination of it's concurrency constructs (goroutines, channels and interfaces) that allows it to be very simple to code with. I think scientific computing is craving this type of simplicity without having to resort to an "Old" language before all the programming languages got complicated. I encourage you (or anyone else) to read the blog posts by Andrew Gerrand that illustrate this point quite well (the sudoko ones are especially interesting).

    Thanks again for contributing.

  3. What's your experience with numerical methods for PDEs?

    The sudoku thing is cute but is in a very different problem class than a climate model.

    Are you imagining each grid cell in a coroutine? Or finer grain than that? How does that help?

  4. I'm an engineer and my work involves modelling in aerospace. It's mostly relatively basic components however it drives some actuators and such, which means that I need to understand vibrations and a few control system concepts.

    The sudoku example illustrates that Go can reduce some of the book keeping required in using the language. While I'd like to show an example of numerical methods taking advantage of this, there aren't too many users who have this as a priority. I'm hoping this project will attract more users who's main interest is scientific computing.

    I haven't decided the overall architecture for the program. My first impression isn't to have one goroutine for each grid point. I don't think that would be the best use of resources.

    My initial impression is to have a "worker" type structure with one goroutine for each component. I would define a structure that would contain all the required information for a grid point. A pointer to this structure would be passed to different goroutines to calculate the different components of the model (eg Atmosphere, Radiation, Ice Dynamics ...etc).

    As I said this is my first impression and it is one of the topics I intend to discuss (I'll likely try to involve some of the regulars on the Go Nuts group).

  5. Cool.

    You will be amazed at how little climatologists (and scientists in general) understand of control systems. My knowledge there is very rusty but I'd be interested to talk about it, and indeed find ways to express it without exposing people to Laplace transforms etc.

    So an atmosphere model, or an ocean model, is typically a set of fluid columns. The science is all in the columns, but unless you get the large scale fluid motions right, you can't really get very far on most of the science. Still, as a programming exercise you can work on the columns, and that will probably be more interesting to most people. That's why I recommend an off-the-shelf dynamics package.

    The natural parallelization of a climate model is in space, not in physical process. Do you care about performance? Even a coarse climate model involves a whole lot of flops. if you want to take this offline.

  6. If your main goal is to show that go is suitable as a scientific language, you should start by building a good, fast, versatile hydrodynamic core.

    As M. Tobis says, it is a rather well-understood stuff, and it is easily comparable with text-book case studies. Full blown climate models, with lots of parameterizations and zillions of tunable parameters are notoriously difficult to compare and understand.

    Assuming that you are aware of the magnitude of the undertaking that you face (I'd say no less than 5 years of work before you get something mildly useful to others, most of which spent studying and fetching-in expertise from climate people, numerical scientist, etc.) one strategy may be the following.

    Start by building-up a toy model which implements an oversimplified situation. For example a solver for the shallow water equations on a sphere.
    That will give you a test bed for your parallelization strategies, a reasonable benchmark for speed, and it will be a concrete piece of software useful to convince more people to that you are not a daydreamer and join-in. It will also be simple enough to be refactored and modified without too much pain when you will discover that your initial design choices are flawed (and you will, oh you will...)

    Furthermore, you need to make clear what kind of numerical solver you will use for the underlying equations (I guess you will solve the primitive equations for both atmosphere and the oceans, or are you inclined towards a non-hydrostatic approach?).
    Will you use finite differences? Finite volumes? Finite elements? Spectral for the atmosphere? What kind of vertical coordinates? Colocated or staggered grid? How will you manage the pole singularities?

    These choices affect a lot the structure of the program, so it's better to discuss them before writing a single line.

    Keep posting...

  7. Thanks for the comment Francesco.

    I'm looking into what the equations are behind the dynamic cores right now (studying a few papers and such). I'll likely have a post describing what my thinking is (which is being shaped by advice given to me). I hope this post will spark further discussion. This will also include what type of solver I'll implement, although this will likely be influenced by my choice to do this project in Go. I think spectral methods might be best suited to Go, and to see why I think so I encourage you to see this paper.

    Just to clarify a bit, I have two reasons for doing this project. I believe that this project could help the understanding of climate science (in my wildest dreams it may even contribute to furthering climate science). Also, I think that Go could be very useful for scientific computing and this project could help to prove it (or prove it won't work). It's this combination that I'm working towards, and I hope to have more contributors to help in this effort. As you say, it's a big effort, and if after 1 year I'm still the only one coding on this project I'll likely stop working on it.

    As for toy models, check the tutorial posts and the github repository. I've already added a few toy models as described in these posts. I've also tried using it to demonstrate how the code will likely look using the Go.

    I hope you'll keep following the posts. Let me know what you think.

  8. Don't worry about the real world dynamics first.

    Just do a simple dynamics problem on a sphere and show us the code and how it scales.

    If readability and cognitive accessibility is a goal, a spectral model will defeat it; reducing your potential audience considerably.

    It may be the case that we are too stuck on a distributed memory paradigm and can produce far more elegant codes if the whole memory issue is abstracted away.

    But either way, it's not at all clear how coroutines scale in representing fixed grid dynamical cores.

    There are lots of other fluid problems that are more suited for adaptive grids. Maybe you need to put your attentions there.

    If you have a well-thought out story and a demo code that addresses this space you may be able to get some traction.

  9. Actually, one of the mini projects I'm planning to do a post about is extending some of the "toy" radiative model code to a sphere (with a simple heat storage/transfer mechanism just to show the code works). This might answer some of what you're asking about (although probably not all).

    I mentioned spectral since I had an idea recently to use the methods mentioned in the squinting paper (encoding a power series as a channel...) to represent fourier series. This could be a great simplification for spectral methods, but it's entirely an idea that I hope will work (although if it does work as I hope it will factor in my decision). I will keep in mind that it could reduce the potential audience (didn't I throw that out the window by originally choosing Go!) Well, at least I'll try to attract as many as I can in a Go environment.

    Do you think there are any short term improvements to the "toy" code that could help attract people to this project? Is there anything that I could demonstrate easily that could put some of the issues to rest (keeping in mind that I doubt I'll have the code running on a major HPC system in the near term)?

  10. Man who chases two rabbits does not eat.

    One of the reasons the existing code base is so problematic is that so many types of expertise are involved. No matter how smart you are you cannot do all of them.

    I would like to prove that climate models can be improved insofar as readability and other factors, at the cost of some performance. I am reasonably convinced that Python suffices, but can be convinced to go with another platform if there's enough momentum.

    You would like to prove that Go is suitable for some scientific problem, but you want to do this for a very large domain which you don't fully understand, for purpose you haven't specified.

    I repeat my advice. Either 1) go for an EMIC and abandon the full fluid dynamics, or 2) go for an existing dynamic core. Either way you can start addressing the physics relatively quickly, at the cost that in case 1 you will never have an intellectually convincing simulation and in case 2 you will not have a demonstration of Go.

    The third choice is to abandon your interest in climate and build a dycore. This is an impressive ambition. If you succeed, people will be interested in moving on to the rest of the physics ("the physics" as opposed to the dynamics). But you will learn little about climate for some time.

    If you choose this route, you will need to think about performance, because the dycore is the performance critical piece of a climate model. That being the case you will need to think about the platform. And that way lies the madness that swallows the software teams at the climate centers.

    If I ever manage to do this myself I will choose path 2 with Python.

    If I were you I would choose path 2 regardless. If you choose path 3, as you seem to be doing, I will not be interested in helping until the event that you provide a dycore that holds more appeal to me than the existing ones, and I would recommend the same allocation of attention to most others interested in climate.

    However, the "toy code" seems to be moving toward an EMIC. That (path 1) might make sense.

  11. I completely agree with M. Tobis' last post. You have 3 choices:

    1) A simple model with basic radiative processes and little or no dynamics

    2) A glue work putting together other models

    3) A hydrodynamical core (eiher generic, or specialized for the atmosphere or the ocean)

    1) is computationally simple, it has a great teaching value (you could set up a web site that allows a live interaction with the model), it can be even useful to do real science, if it allows for integrations on geological time scales (e.g. to investigate snowball/slushball earth questions and the like). With this approach you can definitely understand if go is a good language for science.

    2) This approach will teach you a lot about climate modeling (or, at least, the kind of climate modeling required to investigate global warming issues). It will be revealing on the suitability of go as a glue language, but won't say much about its virtues for scientific computing

    3) This is the most difficult path, but potentially the most rewarding one. You will face very tough design choices, which must be rather clear from the start. You will also need to target some high end hardware: if you want to attract people from the climate community the model must be competitive with the current ones. In fact, it might be worth considering to adopt some unusual scheme/equation: if you implement in go a very mainstream approach, no one will take the pain to use your model (they will keep using the mainstream models).

    A final word on spectral models: even if you have a language that makes it easy to express series and mathematical operations on series (even with no performance penalties), any possible transform algorithm implies that each parallel unit will have to communicate with all the others. Instead, finite differences/volumes and other schemes only need a near-neighbor communication.
    Furthermore, a spectral dyn core can be used for the atmosphere, but not for the oceans (because of the complex lateral boundaries).
    Finally, the elegant constructs of the paper you mention can (i guess) be adapted to express in an elegant and abstract way other numerical schemes.
    In any case you might wish to look at the documentation of ECHAM5, a well-known spectral atmospheric code, which is used as the atmospheric component in several climate models:

  12. Just a quick comment. Maybe tomorrow I'll have a few moments to add a bit more.

    Right now, I'm going to follow path number 1. I think the project has a chance of morphing into path 2 or 3, but it depends on how successful I am at proving Go, as well as what kind of reception this project gets (and if there are more contributors).

    As I said, I'll keep working on the toy code for now and trying to understand how climate models work. I appreciate your advice and I'll keep it in mind as the project moves forward.

  13. Just wanted to clarify my comment from yesterday a bit more. There are a few reasons to prefer path 1 in the short term (6 months to 1 year). I figure the target audience for this blog is the technical public, although I do hope that climate scientists will engage as well. I think by following path 1, it will attract a larger audience, which will hopefully push more people to contribute.

    What I'm thinking of doing now is set short term goals of what I can demonstrate by myself over the next 6 months (specific goals). As I'm working towards those goals, I'll look at the progress to date, and whether it has generated interest. Once these goals are reached, I'll be in a better position to judge where the project can actually go.

    Another aspect is that Go is evolving as well. I think it is extremely well suited to the toy programs I'm writing right now (and I'm having a blast writing them) but high performance computing is an unknown at this point. While I think the basic design concepts will be essentially the same, I think the performance of a program today and the same program a year from now could be very different.

    Whatever happens, I'll keep posting my progress and what my thinking is on the projects future. I hope both of you will continue to follow these posts and give me your feedback.

    Michael, I have one more question for you. Do you think an EMIC would satisfy your goals regarding "readability and other factors"? Or do you see this as a project that can only be done with code that could do full climate simulations?