Hotel Reservations Greece
Internet reservation management for over 1000 selected hotels in the greek
islands and mainland (Athens, Crete, Rhodes, Corfu, Zakynthos, Mykonos,
Santorini, Paros, Naxos, Syros, Halkidiki, Kefalonia etc). All hotel pages
are complete with photos, location, amenities and a detailed description
www.travelguide.gr


Translations of this page: Belorussian (by Michail Bogdanov)

Last modified: Monday, 22-Oct-2007 17:19:03 EEST

How-To: Join Distributed Computing projects that benefit humanity

What if some of the world's estimated 650 million PCs (and 250 million households with broadband Internet) could be linked to assist scientists in solving critical real world problems? This is exactly what humanitarian grid computing is about!

Donate your computer's idle CPU time to humanitarian non-profit scientific research projects. Help find cures for diseases like cancer, AIDS, diabetes, MS, Alzheimer, or help predict the earth's climate change, or advance science e.g. search for gravitational waves, help CERN build its latest particle accelerator or Berkeley search for extraterrestrial intelligence. So you WANT to contribute, but don't know where to start, or how to do it best? Or perhaps you're already contributing to one of the better-known projects, like SETI@Home or Folding@Home and looking for more?

There are many Distributed Computing (D.C.) projects requesting CPU time contributions. This is an effort to categorize the various humanitarian D.C. projects, mostly in terms of technical requirements (and in some cases, links to comments on their scientific merit).

I've spent considerable time looking about how to do it best. My priorites when selecting projects: possible value for world/human kind, not-for-profit, results are made available to the public domain, efficient use of my computers' resources, secure and easy to maintain. I'll share my conclusions here, for others to benefit too. From my (user's / CPU time donor's) perspective, I will divide DC projects into two broad categories: BOINC and non-BOINC. I always prefer BOINC, where possible.

Note: Humanitarian grid computing landscape is changing rapidly, as new exciting projects arrive, others pause or discontinue etc. So, a lot of information out there on the Internet may be outdated, even if written 6-months ago. This document contains the latest info on DC computing as of May-2007 and everything presented here has been checked to the best of my ability.

  1. BOINC
  2. BOINC (Berkeley Open Infrastructure for Network Computing) would be my preferred method for experienced computer users who plan to contribute to many projects. It is a free, open-source software which works under Windows (XP/2K/2003/NT/98/ME), Linux / FreeBSD / Unix and Mac. A "Distributed Computing Framework" created by Berkeley Univ. Once you install/setup BOINC, you can "subscribe" to as many projects as you wish and automate everything. Participants join one or more of the BOINC projects by registering for an account (name/password) at a project's site.
    BOINC Info: BOINC Home, BOINC download, Debian/Ubuntu Linux BOINC, Wiki: Installing and participating in BOINC projects, OcUK BOINC FAQ, BOINC beta sw

    Quick summary about BOINC: It's a software which allows you to participate in multiple projects, and to control how your PC's time is divided among these projects. Projects are independent, and each maintains its own servers. Anyone can create a BOINC project. The BOINC developers (Berkeley University, California) have no control over the creation of BOINC-based projects and do not necessarily endorse them.
    BOINC stats stats2 stats3: 1.000.000 people and 2.000.000 computers in 234 countries as of Jun-2007

    1. Life sciences (mostly computational biophysics) BOINC projects
      1. Rosetta@Home protein prediction and design project ( R@H at Wikipedia, about, science faq, medical relevance, daily progress journal) by Baker Lab at Washington University, USA. It is developing and applying the Rosetta protein prediction and design software (licensed free of charge for academic use).
      2. Proteins@home protein prediction project by Ecole Polytechnique, France (WinXP client only)
      3. TANPAKU protein prediction project by Tokyo University of Science, Japan
      4. Predictor@Home (scientific update 1-Mar-06) protein prediction project by The Scripps Research Institute, USA.
      5. World Community Grid (WCG, see howto join WCG via BOINC) an IBM philanthropic initiative. IBM pays the bills for telecom and hardware, but the decision about which not-for-profit projects to sponsor, is taken by an international Advisory Board. WCG runs 5 projects:
        1. Human Proteome Folding (HPF, HPF at Wikipedia, read about HPF) by Institute for Systems Biology (ISB), USA. Phase-1 applied Rosetta v4.2x software on the human genome and finished Nov-05. HPF Phase-2 (HPF2) applies Rosetta v5.x in "full atom refinement" mode. HPF-2 update 23-Jun-06, update 13-Mar-06, update 17-Jan-06 and HPF-1 recap 22-Nov-05)
        2. FightAIDS@Home by Olson Lab at The Scripps Research Institute, applies the AutoDock sw to test how drugs interact with various forms of HIV Protease. Phase 1 will screen 2,000 compounds against 270 variant strains of HIV. The compounds are selected representatives of a number of different groups. It joined WCG in Nov05 and BOINC in Jan06. Latest 19-Jan-06 update
        3. Help Defeat Cancer research on tissue microarrays (TMA) which promise to help doctors in selecting proper treatment strategies and providing accurate prognosis for cancer patients. Started July-2006
        4. The Genome Comparison Project by Oswaldo Cruz Institute, Fiocruz, Brazil. Aims at improving protein functional annotation in databases.
        5. Help Cure Muscular Dystrophy by Decrypthon, a partnership between AFM (French Muscular Dystrophy Association), CNRS (French National Center for Scientific Research)
      6. SIMAP (the project currently has no work) protein similarities database a joint project of GSF National Research Center for Environment and Health, Neuherberg and Technical University Munich, Center of Life and Food Science Weihenstephan, Germany
      7. italics = project's BOINC is still in beta-test phase, or on hold

      8. Docking@home the project aims to further knowledge of the atomic details of protein-ligand interactions and, by doing so, will search for insights into the discovery of novel pharmaceuticals. ALPHA-test
      9. Folding@Home research on protein folding kinetics, using Molecular Dynamics, by Stanford University, USA (Note Current Folding@Home software runs stand-alone, but F@H has a beta-sw to join BOINC platform). Learn more at F@H at Wikipedia, What is F@H ACTUALLY working on? It has been running for over 5yr and is currently the highest profile DC project after the legendary SETI@Home. Google had included it as part of their GoogleToolBar in 2003-2004, it has enjoyed publicity and has been installed on over 1.5 MILLION PCs, of which currently over 200.000 are still actively computing for F@H offering a sustained power of over 200 TeraFLOPS.
      10. MalariaControl.Net stochastic modelling of the clinical epidemiology and natural history of Plasmodium falciparum malaria.

      Protein research is is particularly important because most diseases are manifested at the level of protein activity. It's also one of the most challenging computational problems of our era (in terms of raw power). Both the world's #1 (IBM's Blue Gene/L, finished 4Q05, shown in photo) and #2 (IBM's Blue Gene) supercomputers have been built mainly for protein research. Blue Gene/L has a sustained speed of 100 TeraFLOPS, whereas volunteers from all over the world already contribute over 200+ TeraFLOPS (over twice the computing power of world's #1 supercomputer) just to Stanford's Folding@Home project and another 200+ TeraFLOPS to SETI@Home project! Grid computing is really powerful!

      Goals and differences of life science projects

      If you visited the project sites above, most talk about "protein folding". Same, competing, complementary? How are they involved with cures for diseases like cancer or AIDS? are the questions everyone asks.

      Quick Summary:

      Most diseases are manifested at the level of protein activity. A "bottleneck" in medical research for many diseases, is that the function/role of many (approx. 60%) proteins in the human body still remains unknown. In fact, the "Find-a-Drug" project closed in Dec-05 after running for 4yrs, because it ran out of new proteins, of known functionality, to check via "virtual screening" (see FAD to close). First the 3D shape of proteins has to be identified, from which scientists expect to learn about the function of these proteins, as the shape of proteins is inherently related to how they function in our bodies.

      Protein 3D shape identification is currently being done "experimentally" in a laboratory (via X-Ray Crystallography and NMR) at great cost in time and money more (btw some bigger proteins are difficult to study "experimentally"). "Protein prediction" projects such as Rosetta@home, TANPAKU, Proteins@home, Predictor@home (as well as currently inactive Distributed Folding) are all developing algorithms to determine protein 3D structures "mathematically" (via computer simulation, also known as "protein structure prediction" or "protein folding"), which will speed up progress immensely. The same software tools might also eventually be used to design new complex proteins, that will inactivate pathogenic organisms (e.g. viruses like common "flu" and HIV/AIDS or bacteria) or repair damaged DNA (e.g. "gene therapy" for curing cancer). Other software performs "docking" checks of potential drugs (small molecules) to a target protein ("virtual screening").

      Projects like Human Proteome Folding apply the "protein prediction" algorithms developed by one of the aforementioned projects (HPF is using Rosetta), to generate 3D structure predictions for selected proteins (e.g. HPF2 studies cancer biomarkers), for biologists and biomeds to look at and use while "annotating" proteins (deciding what they do, which cell processes they are involved with).

      SIMAP computes similarities between proteins, as similarly shaped proteins usually perform same function (so-called orthologs).

      For disease treatment, in the short-term, medical scientists search for drugs via "virtual screening" / "docking" of "ligands" (small water-soluble chemical molecules which can potentially be used as drugs, i.e. which will interact with the protein behind a disease to inhibit or activate it) on known proteins, which have already been identified for diseases. Such projects include CureCancer, FightAIDS@home, Docking@Home, ComputeAgainstCancer, Find-a-Drug (inactive)and D2OL using "docking" software (e.g. DOCK, LigandFit, AutoDock, THINK, Rosetta etc).

      The next step in curing disease (perhaps years away) is to design new (artificial) proteins to perform functions, e.g. Rosetta@home's work for cancer with redesign of a DNA-modifying protein (gene therapies).

      Folding@Home uses molecular dynamics (laws of physics) to study the process of protein folding (the "kinetics") and understand misfolding (aggregation) diseases such as Alzheimer's. Also to develop new, more accurate protein-drug "docking" methods.

      Life science projects explained in detail:

      Rosetta@Home, Proteins@home, TANPAKU, Predictor@home all seek to predict the 3D structure of yet unknown proteins from their amino acid sequences, so biomed scientists can deduce each protein's role / functionality in cell processes (more details later in the "Relevancy of protein prediction projects to cures" section). The difference is in the approach to solve the problem, e.g. Rosetta uses energy functions to find the lowest or most stable state and Predictor uses Monte Carlo simulations using a knowledge based force field based upon a simplified lattice model.

      Folding@Home is an advanced Computational Chemistry project which studies how proteins fold. What is F@H ACTUALLY working on? It's interested in the chemistry of unfolded, partially folded, and completely folded ("native") proteins, as they relate to one another. It does not attempt to predict the final 3D protein structure from aminoacid sequence, but tries to simulate the process of folding at pico-/microsecond timeframe, down at the protein molecule level, using Molecular Dynamics (the laws of physics) to show scientists what actually happens during folding. So, F@H is computing protein folding "pathways" ("trajectories"), like an animated movie (e.g. "Toy Story"): frame after frame, lasting from microseconds up to the few milliseconds, trying to learn enough about the critical parts to help create a better model of the process of protein folding, as well as understand misfolding (aggregation related) diseases like Alzheimer's. This approach requires enormous amounts of computing power to simulate even small, fast-folding proteins, so currently it's not usable for protein 3D structure prediction.

      SIMAP: Because of the huge amount of known protein sequences in public databases it became clear that most of them will not be experimentally characterized in the near future. Nevertheless, proteins that have evolved from a common ancestor often share same functions (so-called orthologs). So it is possible to infer the function of a non-characterized protein from an ortholog with known function.

      HPF (Human Proteome Folding) applies a 3D structure prediction software (Rosetta in particular). Note that Phase-1 of HPF has just finished and Jan-06 project went into Phase-2 (HPF2) [more]. Here is an extract from "Is there any result in treatment or we just resolve more and more proteins?" discussion):

      "The Human Proteome Folding project is basic medical research. We are given some fundamental components of cells (proteins of unknown function) and we try to deduce their shape, then from this deduce which other proteins they interact with, and how. It is like pouring the components of an "Erector set" (editor: construction toy) onto the living room floor and trying to figure out what goes with what. The goal is to figure out the functional networks that drive basic cell processes. Once you have identified the function of a protein you can:
      1. Select it as a target for a drug to interfere with its function
      2. Figure out how it works and design a drug to duplicate the effect
      3. Develop a diagnostic test to detect the concentration of that protein in order to measure the level of activity

      Without the protein information, these three things are very important objectives that can only be accomplished by mass screening of a vast number of chemical compounds, hoping for a lucky breakthrough. Even with this information, a great deal of work, skill and luck is required to develop a drug.

      The HPF project can provide very useful information for drug development, but it is aimed at basic understanding that can then be used to develop drugs. We are providing the shape information. Scientists studying the databases with this structural information will predict the function of the proteins (annotate the proteins)."

      Relevancy of protein prediction projects (Predictor@Home, Proteins@home, Rosetta@Home, TANPAKU etc) to finding cures for cancer, AIDS, diabetes, Multiple Sclerosis and other diseases [source source2]

      The reason that it might be difficult to find a project that works directly on curing cancer is that cancer is such a huge, complicated disease that so many things are involved in. One might study how a growth factor gene can run out of control, or the genetics behind tumor supressors or the signaling involved in the programmed cell death pathway or the environmental factors that lead to mutation, or a dozen other things that contribute to cancer, but it's all of these combined in a unique way that can lead to different types of cancers.

      Cancer indeed is the result of mutations at the DNA level, often multiple mutations (what's known as the "2-hit hypothesis"). However, in order for these mutations to actually cause cancer, they must activate a gene expression program that depends on the action of several proteins. Therefore, targeting the DNA for repair (gene therapy) is one approach for cancer treatment, but another approach (and perhaps more feasible in the short term) is to target the proteins involved in uncontrolled cell growth.

      Therefore, some of the most promising recent treatments for cancers come from drugs that target specific proteins involved in cancer (e.g. drugs that target the tyrosine kinase activity of growth factor proteins). The greatest success in such drug discovery has come from screening large libraries of molecules derived from computational chemistry. This approach is very time consuming and costly and not even guaranteed to work. Another approach to drug discovery comes from "virtual screening" or "rational design" based on experimentally solved protein 3D structures. This requires first an existing protein structure and a way to evaluate the effectiveness of a drug interacting with the structure. Developing methods to predict protein structure in a rational way directly relates to drug discovery in this way, not only for designing treatments for cancer, but for many other diseases.

      As you can see, it is difficult to directly relate projects to predict protein structure to cancer, but they are working towards developing a technology that will aid the discovery of drugs to help treat such diseases.

      More info on "virtual screening":

      The research centers on proteins that have been determined to be a possible target for e.g. cancer therapy. Through a process called "virtual screening", special analysis software will identify molecules that interact with these proteins and will determine which of the molecular candidates has a high likelihood of being developed into a drug. The process is similar to finding the right key to open a special lock by looking at millions upon millions of molecular "keys". If we know the structures of proteins responsible for diabetes, for example, and whether their activity needs to be increased or decreased, we can search for small molecules that either activate or inhibit the protein.
      Learn more about "rational drug design" and "virtual screening": Wikipedia: drug design, Wikipedia: molecular docking.

      Protein projects background info (you can skip it, if you're not interested in the nuts and bolts of research)

      Baker (Rosetta@home head scientist) explains:

      "With the completion of the Human Genome Project, we are now capable of predicting what the amino acid sequences of different proteins in the human body will be. However, while we know the order of the amino acids, proteins do not remain in a two dimensional shapes inside the human body. They "fold" into different three dimensional shapes for various proteins in order to serve their functions inside the human body. A protein with a distinctive amino acid chain does not simply fold in a random manner, but usually folds in a specific configuration. The force behind this phenomenon is believed to be the Hydrophobic Effect. This effect is due to the fact that certain groupings of amino acids are either attracted or repelled by water, therefore they will attempt to either maximize or minimize their exposure to water respectively. Since the fluids that living bodies are composed of primarily consist of water, these amino acid chains fold in on themselves with the chains repelled by water tending to be part of the center of the protein to minimize the amount of their surface area exposed to the body's fluids. Those amino acid chains that are attracted to water tend to be part of the surface of the protein in order to increase the proportion of the chain expose to the body's fluids. Since the forces that act on a protein depend on what the particular amino acid sequence of a protein is, a specific protein will usually 'fold' in the same manner. Due to the complexity of the forces acting upon amino acid sequences large enough to form a protein, it is extremely difficult to predict how the protein will fold, but this knowledge is vital for determining how the protein functions. Devising a way to quickly and accurately predict how a protein will fold could potentially allow researchers to devise treatments for diseases such as Alzheimer's and AIDS.

      By using molecular dynamics, it is possible to attempt to use the basic laws of physics to simulate the folding process for a particular protein. (For example the Folding@home Project.) However, such an approach limited in its utility since it requires an enormous amount of processing power to simulate even a simple protein's folding behavior. The Rosetta@home project takes a different approach by utilizing a newly developed software algorithm to try to predict the shape that a protein would be most likely to fold into. An additional piece of software analyzes the projected results and determines which of the various projected results is most likely to be correct. By utilizing a distributed computing project, it is possible to quickly create a database of billions of possible structures for a protein, and thereby obtain an accurate picture of what that folded protein would look like."

      Good read scientific information for volunteer "folders": Wikipedia: Protein structure prediction

      More on Protein study projects (Rosetta, Predictor, WCG, Folding@Home, SIMAP) differences: Background about Rosetta software and its use in ISB's HPF project, BOINC-Wiki.

      Predictor@home and Rosetta@home differences from forum posts R@H forum and P@H forum. Explanation of Predictor@Home algorithm.

      Article Gene Machine in WIRED.com (2001) gives "background info" and talks about world's biggest supercomputer, built for protein folding. The Protein Hunters, WIRED Apr-01. Protein Design Processes biological warfare defense project, by DARPA. Protein Folding Overview articles on the subject of protein folding and disease.

    2. Physics, Climate, SETI etc BOINC projects
      1. Einstein@Home search data from the Laser Interferometer Gravitational wave Observatory (LIGO) in the US and from the GEO 600 gravitational wave observatory in Germany for signals coming from rapidly rotating neutron stars, known as pulsars.
      2. ClimatePrediction.Net aka CPDN. Quick Summary: CPDN project runs a large Monte Carlo type simulation of Global Circulation Model (GCM) parameters, in order to improve climate predictions for the 21st Century, and to quantify the errors involved. Although there is currently a broad agreement that temperatures will rise, predictions of the magnitude of this temperature rise vary tremendously. Climate change is perceived to be a main threat to our civilisation. This is a collaboration between a number of universities and organisations in the UK. The project went live in September 2003. In Feb-2006, BBC joined. Read more from BBC BBC Climate Change, CPDN project on BBC news. Don't join this project if your PC runs only a few hours per week.

        The goal of ClimatePrediction.Net is to predict the evolution of the worldwide climate in the 21st century. The usual way to do this is to start with the best weather measures, the best estimates of the parameters that govern the evolution of the climate, and to run the simulation on a super-computer.

        The problem with this technique is that it ignores the main problem: we don't have precise measures of all the parameters of the current situation and, worse still, we only have an imprecise estimate of some crucial simulation parameters. For instance, the rate at which CO2 is absorbed by oceans is not known with precision. This is one of the reasons why we currently have contradictory long term climate predictions. If the parameters are wrong the simulation is worthless.

        In fact, we can get information out of a simulation result even if it starts from parameters that are not really accurate. But we need many of these: thousands, tens of thousands, maybe even more. By analyzing the results of all these simulations one can get a better understanding of the effect of each parameter and, determine what is the most likely result, and also determine which are the most important parameters.

        But not even a super-computer is up to the task. Thus the idea to make each simulation on a separate computer by using spare CPU cycles and the Internet to collect the results. Here is how it works: the ClimatePrediction team distributes the simulation software, reference climate data and a set of parameters to be tested. The program then simulates the climate evolution over a period of model-decades and then returns the results to the ClimatePrediction team.

        More information on CPDN scientific strategy.

        Advantages: Useful, practical project with some beautiful graphics of the GCM simulations.
        Disadvantages: The client is large and project data can be over 1 GByte. One block of data takes a long time to compute: a 2.4GHz Pentium-4 will take about 3 weeks (running 24/7).

      3. SETI@Home, the legendary project of Berkeley University, started in 1999. The largest project in history with over 5 million users. The goal of the project is to discover signs of extraterrestrial intelligence (there is so precious little of it on earth). Deep space radio signals collected by the Arecibo radio antenna in Puerto Rico are broken down into WUs and processed by the client machines. SETI is now working to get the "Paul Allen" (named after the co-founder of Microsoft who made a large donation) radio telescope array online.
      4. Leiden Classical a Desktop Computer Grid dedicated to general Classical Dynamics
      5. LHC@Home (down since Sep-2006) helps the construction of LHC (Large Hadron Collider). It will be the largest and most powerful particle accelerator ever built. LHC@home simulates how the particles travel trough the 27 km long tunnel. With the help of the calculated information, the magnets that control the beam can be calibrated with greater precision. LHC@home was created by, and is based at, CERN, which is located in Geneva, Switzerland.
      6. Quantum Monte Carlo (QMC@home) develop the Quantum Monte Carlo method for general use in Quantum Chemistry. BETA-test
      7. Spinhenge@home research of nano-magnetic molecules. In the future these molecules will be used in the local tumor chemotherapy and to develop tiny memory-modules. BETA-test
      8. Nano-Hive@Home to simulate large-scale nanotech systems BETA-test

    There are many more BOINC projects, the ones listed above are just the humanitarian ones (e.g. breaking encryption algorithms may be an interesting challenge, but it's not one that benefits humanity) I've had a chance to look at more closely sofar. I'm sure there are other projects worthy to receive your CPU's idle time. You can find more in the Links section at the end.

    Some people become so enthusiastic, that they create "computer farms", built to "crunch" (compute) as much as possible. And compete against each other for getting the most credits (work done) for their favorite projects.

    learn more Unofficial BOINC Wiki, BOINC page at Wikipedia, SETI@Home page at Wikipedia.

  3. non-BOINC
  4. Projects requiring stand-alone agent software

    Advantages: Easier to setup for non-technical users (no per-project registration and configuration required). Useful if you want to contribute to just one project only, or if your favorite project is not available via BOINC. If you choose to go this way (non-BOINC), it should be either bec you're not computer savvy or bec you really want to contribute to a specific non-BOINC project.

    Disadvantages: Unlike BOINC projects, stand-alone agents are often mutually exclusive: e.g. if you're running FightAIDS and try to install Oxford's anti-cancer project, the installer will ask you to un-install WCG (see comment). So you can't run both projects. Also, in some cases in the past, companies use a small (1%) part of the aggregate CPU time to run commercial interest projects (effectively becoming "brokers" of CPU time) and paid some of the proceeds to users or donated it for them to charities.

    Most of these projects are backed by big companies e.g. IBM or Intel for marketing support (i.e. "visibility" and company "public relations"). I can't help noticing that commercially backed projects tend to put catchwords like "cancer" or "AIDS" in the title, which really get people's attention...

    The projects which run on "proprietary" DC platform software (but all except grid.org/Oxford's are already or soon will be on BOINC as well) include:

    1. Folding@Home run by Stanford Univ - study of protein folding (science behind the project). Currently the highest-profile DC project on the Internet after SETI@Home. Supported by Intel, Google and others back in 2002-2003 allowed it to get very large user base and thriving user community (forums, teams competing to donate most CPU time etc) Setup instructions. intro in Greek written by myself. Current Folding@Home software runs stand-alone, but F@H is preparing a beta to join BOINC. F@H at Wikipedia.
      F@H stats: 522.000 people, 180.000 active CPUs (out of 1.5million) as of Jul-06
    2. World Community Grid (formally IBM - IBM pays the bills for server hardware and networking but the decision on which projects to support is taken by an Advisory Board), offers 2 DC platforms: ether United Devices software (Windows clients only) or BOINC software (both Linux and Windows). I recommend joining WCG via BOINC. WCG Windows agent, based on UD technology, might be easier to install (appropriate for less experienced computer users) but BOINC offers more flexibility. WCG currently hosts 5 projects (explained in more detail in the BOINC-Life Sciences section of this document):
      • Human Proteome Folding
      • FightAIDS
      • Help Defeat Cancer
      • Genome Comparison
      • Help Cure Muscular Dystrophy
      WCG stats: 200.000 people, 370.000 CPUs (active?)
    3. IBM is installing WCG agent (automatically, as "recommended software" more) to their corporate sites. In a way, one can think of WCG as a "commercially" run / supported DC Framework for multiple projects (currently hosting 3 projects). Which projects receive support gets decided by an advisory board. It is run and maintained by IBM (when one joins, you actually enter in a legal contract with IBM). IBM says control of the project will eventually be transferred to a suitable non-profit entity once the project gets firmly on its feet.

      quote from WCG End User License Agreement

      "WELCOME TO WORLD COMMUNITY GRID. THIS IS THE AGREEMENT THAT APPLIES TO YOUR PARTICIPATION IN WORLD COMMUNITY GRID. IT IS BETWEEN YOU AND INTERNATIONAL BUSINESS MACHINES CORPORATION (WE WILL REFER TO OURSELVES AS "IBM" OR "WE"). IT IS ANTICIPATED THAT EVENTUALLY IBM WILL TRANSFER WORLD COMMUNITY GRID TO A NON-IBM OWNED NOT-FOR-PROFIT ENTITY. SUCH A TRANSFER WOULD CONSTITUTE AN IBM CONTRIBUTION. IF AND WHEN THAT OCCURS, THIS AGREEMENT WILL ALSO APPLY TO THAT ENTITY."

      WCG is available as a BOINC project, and now one can also fine-tune CPU resources or opt-out of its individual sub-projects (I don't mind, both HPF and FightAIDS are equally good imo). Normally, UD/WCG and BOINC/WCG users will get work-units of both projects automatically.

    4. grid.org [inactive] (formally United Devices, which established grid.org as a "proof of concept" for its grid software, to demonstrate the viability and benefits of large-scale Internet-based grid computing). Its main project is from Oxford on cancer. Software works on Windows (98/ME/2K/XP) only (no Linux or Mac).

      grid.org stats: 1.3million people, 3.5 million CPUs (but I wonder, how many of them are active?)

      • Cure Cancer [completed] aka Screensaver-Lifesaver. Ran from 2001 to Apr-2007, by Oxford University Chemistry Dept (UK) with NFCR (National Foundation for Cancer Research), backed by UnitedDevices and Intel. This project is using the LigandFit software to test 3.5billion chemicals against the cancer proteins of Leukemia and pancreatic cancer, two of the most deadly types of cancer. So essentially, this is a huge jigsaw puzzle which is being solved by brute force. As of Jan06 in Phase-2, it's refining prior "hits", effectively letting the researchers know what NOT to spend their time, money, and resources on. On the same subject (but older) Q&A info from NFCR NCFR-Oxford-UD Screensaver-Lifesaver project homepage. Latest news Dec-2005 (more):
        "Using Screensaver Lifesaver / LigandFit technology, we have identified three promising leads Mol597, Mol238 and Mol628 for urokinase inhibition. Based on the docking studies these molecules show high potential as uPA inhibitors. These molecules can be used as lead molecules for the design of better uPA inhibitors as potential anti-cancer therapeutic agents. We are synthesizing these molecules and testing them for biological activity."
        Emailed feedback from Prof.Richards 30-Nov-05 (source)
        The screensaver project has produced a large number of 'hits'; molecules predicted to be potential inhibitors of proteins and possible leads for drug discovery. The bottle neck to exploitation of these predictions is the potentially costly synthesis of the compounds and their biological testing.
        For two of the series we have passed that stage and the results are very encouraging.
        One of the more interesting and challenging targets is the phosphatase.
        Much of biology is a balance between phosphatases and kinases. The latter have been important commercial targets for anti-cancer drugs for many years, but the phosphatases have proved more difficult as for one thing the biding site into which the drug must bind is rather ill-defined. The project produced some 128,000 hits for this target. We analysed the results and produced a list of 400 good hits of differing chemical types. These were synthesized and tested. Over 40 [~10%] proved to be genuine inhibitors which is very good by industry standards and what is more they are uncharged molecules which are very different from known inhibitors. We are trying to find pharmaceutical companies to take this further.
        For a second series where the target is urokinase plasminogen activator [implicated in prostate cancer] we have again had hits synthesized and tested, this time by collaborators at the Arizona Cancer Center. This too has produced encouraging novel active compounds . These results were presented at the American Association for Cancer Research and a poster giving details is being put on the Oxford Chemistry web site
      • Human Proteome Folding about. Phase I ran on both WCG and Grid.org, current Phase II runs only on WCG (and via BOINC/WCG, which is how I run it).
    5. Find-A-Drug Project [inactive] was the first mass-scale DC health sciences project, looking for drugs for diseases such as cancer, AIDS, multiple sclerosis, SARS, malaria, respiratory diseases, mad cow and others. Again the method was a mass-screening of drug-like molecules against disease proteins, using the THINK software. Started Apr-02, closed 16-Dec-05.
      "We have also targetted most of the recognised protein targets for the major project areas. [...] Our experience suggests that it will be difficult to find collaborators who will be interested in the results of targetting proteins whose biological function is unknown or of little therapeutic interest." -Keith Davies, FAD
      The first phase of the internet based anti-cancer computing project "Screensaver Lifesaver" aka CureCancer, used the THINK science software and was hosted by United Devices. In collaboration with the National Foundation for Cancer Research (NFCR), Oxford University and Intel Corporation. The second phase of the NFCR+Oxford+UD Cure-Cancer/Screensaver-Lifesaver project chose to use the LIGANDFIT software. At that point, Keith Davies, who is an honorary research fellow at the University and is the founder of Treweren Consultants which developed the THINK software, established Find-a-Drug to continue research using the THINK software.
    6. computeagainstcancer.org (defunct? site/news not updated since 2003). Launched Jul-2000, sponsored by PARABON Computation, hosts 3 projects. Apparently the company makes money by selling a percentage of the aggregate CPU resources to commercial interests. "By choosing to take part in Compute Against Cancer rather than Parabon's standard provider program, you simply choose to donate the money we would pay you for your computer's idle time to the charity of your choice." (from their FAQ)
    7. D2OL The Drug Design and Optimization Lab (D2OL) works to discover drug candidates against Anthrax, Smallpox, Ebola and SARS and other potentially devastating infectious diseases or bioterrorism agents. Explanation of the "docking" method for pathogens
    8. Distributed Folding project - currently inactive

Note: The stand-alone projects mentioned above are just the most widely followed (over 100.000 participants). There are many more "stand-alone" distributed computing projects available. You can find more at distributedcomputing.info and www.volunteerathome.com sites, with extensive, helpful project profiles. It also explains BOINC, WCG and UD. Hopefully more will become available under the BOINC platform in the future, which manages them to achieve optimal performance.


Observations:

I can't help but notice that the life science projects lag behind SETI's userbase of 5 MILLION. To the credit of Berkeley, they developed BOINC (Why classic SETI@home is closing down), so SETI@Home's HUGE userbase can now opt to donate CPU time to other projects easily, when SETI hasn't much work (as is the case lately), which is strongly encouraged by Berkeley/SETI folks. Any project can now tap into this vast pool of computing resources.

Still, trends probably indicate that the "donating user" community is focused on more "technical" projects i.e. SETI+CPDN+Einstein+LHC consume about 90% of overall BOINC CPU time. I think one reason is that most "life science" projects came online via BOINC recently and also a concern about potential exploitation of biomedical research results by BigPharma. Life science projects should try to address these concerns. (read heated discussion at UD-Intel cancer forum and request for feedback at HPF forum, more doubts and answers). Regular feedback is important for "troop morale" (still, plenty of CPU power will go to projects with the prettiest graphics and fancy stats). Personally, I want transparency and full disclosure.

Issues to consider when choosing projects:

BOINC Wiki CPDN: How to decide on Resource Share offers more thoughts on the matter.

My choices

How did I decide which of all BOINC projects to contribute to? Due to most "life science" projects having joined BOINC fairly recently (e.g. Rosetta@home came online out of beta Nov05) they are under-represented in terms of CPU time. Looking at late-2005 stats, about 90% of donated CPU time via BOINC goes to SETI+CPDN+Einstein+LHC. Wrt SETI, it is my opinion that if one wants to look for extra-terrestrial intelligence, one should be primarily examining the UFO evidence, rather than trying to pick up radio signals, because an advanced civilisation might be using all sorts of physics that we don't know about yet, such as quantum entanglement (although certainly radio-astronomy is a great window to the universe).

Anyway, SETI@home has plenty of CPU power already, enough to process every WU 5 times. It's not constrained by available CPU power. At this point SETI@home could use more funds (monetary donations), rather than more CPUs.

Wrt physical sciences projects, I think understanding gravity will allow humanity its next big leap forward and understanding the universe. So both Einstein and LHC are important.

Human Proteome Folding (WCG/HPF) which applies the Rosetta software on the human proteome seems also useful. WCG forums discussion (illuminating). Unfortunately, WCG's (IBM's) current redundancy (initial replication, quorum) settings are just too wasteful.

Rosetta project which is investigating protein 3D shape prediction, has consistently been among the top performing methods for ab initio prediction in recent bi-annual CASP experiments and also does disease related research. Improving those algorithms has a multiplier effect for the progress of other programs (e.g. HPF).

Folding@home is the oldest (since 2000) and biggest life sciences (protein-research) project (still non-BOINC though). But, over the years, I never really understood how the science done at Folding@home would eventually lead to treatments or cures. With Rosetta@home the connection to identifying drug targets and to drug design seemed much more obvious.

So, for the moment I donate my CPU time to

  1. Rosetta@Home 40%
  2. Einstein@Home 30%
  3. SIMAP 15%
  4. World Community Grid ("Human Proteome Folding" and "FightAIDS") 15%
CPDN is another fine project but quite heavy and would love to add it, running on a PC on its own, at a later point. Many projects in my list are new and still have relatively few participants.

Links:

I hope this document will be useful to you,
Dimitris Hatzopoulos,
email: dhatz-dc AT hyper.net
Last modified: Monday, 22-Oct-2007 17:19:03 EEST

PS: I'll also prepare a simple set of instruction about installing BOINC on Linux in a "sandbox" (i.e. in a secure isolated account). Meanwhile, Linux users can also consult Debian Linux installation package and BOINC security

Bookmark or recommend this article


Copyright © 2006-2007
Except where otherwise noted,
content on this site is licensed under the
Creative Commons Attribution 3.0 License