Hotel Reservations Greece
Internet reservation management for over 1000 selected hotels in the greek
islands and mainland (Athens, Crete, Rhodes, Corfu, Zakynthos, Mykonos,
Santorini, Paros, Naxos, Syros, Halkidiki, Kefalonia etc). All hotel pages
are complete with photos, location, amenities and a detailed description
Translations of this page: Belorussian (by Michail Bogdanov)
Last modified: Monday, 22-Oct-2007 17:19:03 EEST
How-To: Join Distributed Computing projects that benefit humanity
What if some of the world's estimated 650 million PCs (and 250 million households with broadband Internet) could be linked to assist scientists in solving critical real world problems? This is exactly what humanitarian grid computing is about!
Donate your computer's idle CPU time to humanitarian non-profit scientific research projects. Help find cures for diseases like cancer, AIDS, diabetes, MS, Alzheimer, or help predict the earth's climate change, or advance science e.g. search for gravitational waves, help CERN build its latest particle accelerator or Berkeley search for extraterrestrial intelligence. So you WANT to contribute, but don't know where to start, or how to do it best? Or perhaps you're already contributing to one of the better-known projects, like SETI@Home or Folding@Home and looking for more?
There are many Distributed Computing (D.C.) projects requesting CPU time contributions. This is an effort to categorize the various humanitarian D.C. projects, mostly in terms of technical requirements (and in some cases, links to comments on their scientific merit).
I've spent considerable time looking about how to do it best. My priorites when selecting projects: possible value for world/human kind, not-for-profit, results are made available to the public domain, efficient use of my computers' resources, secure and easy to maintain. I'll share my conclusions here, for others to benefit too. From my (user's / CPU time donor's) perspective, I will divide DC projects into two broad categories: BOINC and non-BOINC. I always prefer BOINC, where possible.
Note: Humanitarian grid computing landscape is changing rapidly, as new exciting projects arrive, others pause or discontinue etc. So, a lot of information out there on the Internet may be outdated, even if written 6-months ago. This document contains the latest info on DC computing as of May-2007 and everything presented here has been checked to the best of my ability.
Quick summary about BOINC: It's a software which allows you to participate in multiple projects, and to control how your PC's time is divided among these projects. Projects are independent, and each maintains its own servers. Anyone can create a BOINC project. The BOINC developers (Berkeley University, California) have no control over the creation of BOINC-based projects and do not necessarily endorse them.
BOINC stats stats2 stats3: 1.000.000 people and 2.000.000 computers in 234 countries as of Jun-2007
italics = project's BOINC is still in beta-test phase, or on hold
Most diseases are manifested at the level of protein activity. A "bottleneck" in medical research for many diseases, is that the function/role of many (approx. 60%) proteins in the human body still remains unknown. In fact, the "Find-a-Drug" project closed in Dec-05 after running for 4yrs, because it ran out of new proteins, of known functionality, to check via "virtual screening" (see FAD to close). First the 3D shape of proteins has to be identified, from which scientists expect to learn about the function of these proteins, as the shape of proteins is inherently related to how they function in our bodies.
Protein 3D shape identification is currently being done "experimentally" in a laboratory (via X-Ray Crystallography and NMR) at great cost in time and money more (btw some bigger proteins are difficult to study "experimentally"). "Protein prediction" projects such as Rosetta@home, TANPAKU, Proteins@home, Predictor@home (as well as currently inactive Distributed Folding) are all developing algorithms to determine protein 3D structures "mathematically" (via computer simulation, also known as "protein structure prediction" or "protein folding"), which will speed up progress immensely. The same software tools might also eventually be used to design new complex proteins, that will inactivate pathogenic organisms (e.g. viruses like common "flu" and HIV/AIDS or bacteria) or repair damaged DNA (e.g. "gene therapy" for curing cancer). Other software performs "docking" checks of potential drugs (small molecules) to a target protein ("virtual screening").
Projects like Human Proteome Folding apply the "protein prediction" algorithms developed by one of the aforementioned projects (HPF is using Rosetta), to generate 3D structure predictions for selected proteins (e.g. HPF2 studies cancer biomarkers), for biologists and biomeds to look at and use while "annotating" proteins (deciding what they do, which cell processes they are involved with).
SIMAP computes similarities between proteins, as similarly shaped proteins usually perform same function (so-called orthologs).
For disease treatment, in the short-term, medical scientists search for drugs via "virtual screening" / "docking" of "ligands" (small water-soluble chemical molecules which can potentially be used as drugs, i.e. which will interact with the protein behind a disease to inhibit or activate it) on known proteins, which have already been identified for diseases. Such projects include CureCancer, FightAIDS@home, Docking@Home, ComputeAgainstCancer, Find-a-Drug (inactive)and D2OL using "docking" software (e.g. DOCK, LigandFit, AutoDock, THINK, Rosetta etc).
The next step in curing disease (perhaps years away) is to design new (artificial) proteins to perform functions, e.g. Rosetta@home's work for cancer with redesign of a DNA-modifying protein (gene therapies).
Folding@Home uses molecular dynamics (laws of physics) to study the process of protein folding (the "kinetics") and understand misfolding (aggregation) diseases such as Alzheimer's. Also to develop new, more accurate protein-drug "docking" methods.
Life science projects explained in detail:
Rosetta@Home, Proteins@home, TANPAKU, Predictor@home all seek to predict the 3D structure of yet unknown proteins from their amino acid sequences, so biomed scientists can deduce each protein's role / functionality in cell processes (more details later in the "Relevancy of protein prediction projects to cures" section). The difference is in the approach to solve the problem, e.g. Rosetta uses energy functions to find the lowest or most stable state and Predictor uses Monte Carlo simulations using a knowledge based force field based upon a simplified lattice model.
Folding@Home is an advanced Computational Chemistry project which studies how proteins fold. What is F@H ACTUALLY working on? It's interested in the chemistry of unfolded, partially folded, and completely folded ("native") proteins, as they relate to one another. It does not attempt to predict the final 3D protein structure from aminoacid sequence, but tries to simulate the process of folding at pico-/microsecond timeframe, down at the protein molecule level, using Molecular Dynamics (the laws of physics) to show scientists what actually happens during folding. So, F@H is computing protein folding "pathways" ("trajectories"), like an animated movie (e.g. "Toy Story"): frame after frame, lasting from microseconds up to the few milliseconds, trying to learn enough about the critical parts to help create a better model of the process of protein folding, as well as understand misfolding (aggregation related) diseases like Alzheimer's. This approach requires enormous amounts of computing power to simulate even small, fast-folding proteins, so currently it's not usable for protein 3D structure prediction.
SIMAP: Because of the huge amount of known protein sequences in public databases it became clear that most of them will not be experimentally characterized in the near future. Nevertheless, proteins that have evolved from a common ancestor often share same functions (so-called orthologs). So it is possible to infer the function of a non-characterized protein from an ortholog with known function.
HPF (Human Proteome Folding) applies a 3D structure prediction software (Rosetta in particular). Note that Phase-1 of HPF has just finished and Jan-06 project went into Phase-2 (HPF2) [more]. Here is an extract from "Is there any result in treatment or we just resolve more and more proteins?" discussion):
"The Human Proteome Folding project is basic medical research. We are given some fundamental components of cells (proteins of unknown function) and we try to deduce their shape, then from this deduce which other proteins they interact with, and how. It is like pouring the components of an "Erector set" (editor: construction toy) onto the living room floor and trying to figure out what goes with what. The goal is to figure out the functional networks that drive basic cell processes. Once you have identified the function of a protein you can:
- Select it as a target for a drug to interfere with its function
- Figure out how it works and design a drug to duplicate the effect
- Develop a diagnostic test to detect the concentration of that protein in order to measure the level of activity
Without the protein information, these three things are very important objectives that can only be accomplished by mass screening of a vast number of chemical compounds, hoping for a lucky breakthrough. Even with this information, a great deal of work, skill and luck is required to develop a drug.
The HPF project can provide very useful information for drug development, but it is aimed at basic understanding that can then be used to develop drugs. We are providing the shape information. Scientists studying the databases with this structural information will predict the function of the proteins (annotate the proteins)."
Relevancy of protein prediction projects (Predictor@Home, Proteins@home, Rosetta@Home, TANPAKU etc) to finding cures for cancer, AIDS, diabetes, Multiple Sclerosis and other diseases [source source2]
The reason that it might be difficult to find a project that works directly on curing cancer is that cancer is such a huge, complicated disease that so many things are involved in. One might study how a growth factor gene can run out of control, or the genetics behind tumor supressors or the signaling involved in the programmed cell death pathway or the environmental factors that lead to mutation, or a dozen other things that contribute to cancer, but it's all of these combined in a unique way that can lead to different types of cancers.
Cancer indeed is the result of mutations at the DNA level, often multiple mutations (what's known as the "2-hit hypothesis"). However, in order for these mutations to actually cause cancer, they must activate a gene expression program that depends on the action of several proteins. Therefore, targeting the DNA for repair (gene therapy) is one approach for cancer treatment, but another approach (and perhaps more feasible in the short term) is to target the proteins involved in uncontrolled cell growth.
Therefore, some of the most promising recent treatments for cancers come from drugs that target specific proteins involved in cancer (e.g. drugs that target the tyrosine kinase activity of growth factor proteins). The greatest success in such drug discovery has come from screening large libraries of molecules derived from computational chemistry. This approach is very time consuming and costly and not even guaranteed to work. Another approach to drug discovery comes from "virtual screening" or "rational design" based on experimentally solved protein 3D structures. This requires first an existing protein structure and a way to evaluate the effectiveness of a drug interacting with the structure. Developing methods to predict protein structure in a rational way directly relates to drug discovery in this way, not only for designing treatments for cancer, but for many other diseases.
As you can see, it is difficult to directly relate projects to predict protein structure to cancer, but they are working towards developing a technology that will aid the discovery of drugs to help treat such diseases.
More info on "virtual screening":
The research centers on proteins that have been determined to be a possible target for e.g. cancer therapy. Through a process called "virtual screening", special analysis software will identify molecules that interact with these proteins and will determine which of the molecular candidates has a high likelihood of being developed into a drug. The process is similar to finding the right key to open a special lock by looking at millions upon millions of molecular "keys". If we know the structures of proteins responsible for diabetes, for example, and whether their activity needs to be increased or decreased, we can search for small molecules that either activate or inhibit the protein.Learn more about "rational drug design" and "virtual screening": Wikipedia: drug design, Wikipedia: molecular docking.
Protein projects background info (you can skip it, if you're not interested in the nuts and bolts of research)
Baker (Rosetta@home head scientist) explains:
"With the completion of the Human Genome Project, we are now capable of predicting what the amino acid sequences of different proteins in the human body will be. However, while we know the order of the amino acids, proteins do not remain in a two dimensional shapes inside the human body. They "fold" into different three dimensional shapes for various proteins in order to serve their functions inside the human body. A protein with a distinctive amino acid chain does not simply fold in a random manner, but usually folds in a specific configuration. The force behind this phenomenon is believed to be the Hydrophobic Effect. This effect is due to the fact that certain groupings of amino acids are either attracted or repelled by water, therefore they will attempt to either maximize or minimize their exposure to water respectively. Since the fluids that living bodies are composed of primarily consist of water, these amino acid chains fold in on themselves with the chains repelled by water tending to be part of the center of the protein to minimize the amount of their surface area exposed to the body's fluids. Those amino acid chains that are attracted to water tend to be part of the surface of the protein in order to increase the proportion of the chain expose to the body's fluids. Since the forces that act on a protein depend on what the particular amino acid sequence of a protein is, a specific protein will usually 'fold' in the same manner. Due to the complexity of the forces acting upon amino acid sequences large enough to form a protein, it is extremely difficult to predict how the protein will fold, but this knowledge is vital for determining how the protein functions. Devising a way to quickly and accurately predict how a protein will fold could potentially allow researchers to devise treatments for diseases such as Alzheimer's and AIDS.
By using molecular dynamics, it is possible to attempt to use the basic laws of physics to simulate the folding process for a particular protein. (For example the Folding@home Project.) However, such an approach limited in its utility since it requires an enormous amount of processing power to simulate even a simple protein's folding behavior. The Rosetta@home project takes a different approach by utilizing a newly developed software algorithm to try to predict the shape that a protein would be most likely to fold into. An additional piece of software analyzes the projected results and determines which of the various projected results is most likely to be correct. By utilizing a distributed computing project, it is possible to quickly create a database of billions of possible structures for a protein, and thereby obtain an accurate picture of what that folded protein would look like."
Good read scientific information for volunteer "folders": Wikipedia: Protein structure prediction
More on Protein study projects (Rosetta, Predictor, WCG, Folding@Home, SIMAP) differences: Background about Rosetta software and its use in ISB's HPF project, BOINC-Wiki.
Predictor@home and Rosetta@home differences from forum posts R@H forum and P@H forum. Explanation of Predictor@Home algorithm.
Article Gene Machine in WIRED.com (2001) gives "background info" and talks about world's biggest supercomputer, built for protein folding. The Protein Hunters, WIRED Apr-01. Protein Design Processes biological warfare defense project, by DARPA. Protein Folding Overview articles on the subject of protein folding and disease.
The goal of ClimatePrediction.Net is to predict the evolution of the worldwide climate in the 21st century. The usual way to do this is to start with the best weather measures, the best estimates of the parameters that govern the evolution of the climate, and to run the simulation on a super-computer.
The problem with this technique is that it ignores the main problem: we don't have precise measures of all the parameters of the current situation and, worse still, we only have an imprecise estimate of some crucial simulation parameters. For instance, the rate at which CO2 is absorbed by oceans is not known with precision. This is one of the reasons why we currently have contradictory long term climate predictions. If the parameters are wrong the simulation is worthless.
In fact, we can get information out of a simulation result even if it starts from parameters that are not really accurate. But we need many of these: thousands, tens of thousands, maybe even more. By analyzing the results of all these simulations one can get a better understanding of the effect of each parameter and, determine what is the most likely result, and also determine which are the most important parameters.
But not even a super-computer is up to the task. Thus the idea to make each simulation on a separate computer by using spare CPU cycles and the Internet to collect the results. Here is how it works: the ClimatePrediction team distributes the simulation software, reference climate data and a set of parameters to be tested. The program then simulates the climate evolution over a period of model-decades and then returns the results to the ClimatePrediction team.
More information on CPDN scientific strategy.
Advantages: Useful, practical project with some beautiful graphics of the GCM simulations.
Disadvantages: The client is large and project data can be over 1 GByte. One block of data takes a long time to compute: a 2.4GHz Pentium-4 will take about 3 weeks (running 24/7).
There are many more BOINC projects, the ones listed above are just the humanitarian ones (e.g. breaking encryption algorithms may be an interesting challenge, but it's not one that benefits humanity) I've had a chance to look at more closely sofar. I'm sure there are other projects worthy to receive your CPU's idle time. You can find more in the Links section at the end.
Some people become so enthusiastic, that they create "computer farms", built to "crunch" (compute) as much as possible. And compete against each other for getting the most credits (work done) for their favorite projects.
learn more Unofficial BOINC Wiki, BOINC page at Wikipedia, SETI@Home page at Wikipedia.
Advantages: Easier to setup for non-technical users (no per-project registration and configuration required). Useful if you want to contribute to just one project only, or if your favorite project is not available via BOINC. If you choose to go this way (non-BOINC), it should be either bec you're not computer savvy or bec you really want to contribute to a specific non-BOINC project.
Disadvantages: Unlike BOINC projects, stand-alone agents are often mutually exclusive: e.g. if you're running FightAIDS and try to install Oxford's anti-cancer project, the installer will ask you to un-install WCG (see comment). So you can't run both projects. Also, in some cases in the past, companies use a small (1%) part of the aggregate CPU time to run commercial interest projects (effectively becoming "brokers" of CPU time) and paid some of the proceeds to users or donated it for them to charities.
Most of these projects are backed by big companies e.g. IBM or Intel for marketing support (i.e. "visibility" and company "public relations"). I can't help noticing that commercially backed projects tend to put catchwords like "cancer" or "AIDS" in the title, which really get people's attention...
The projects which run on "proprietary" DC platform software (but all except grid.org/Oxford's are already or soon will be on BOINC as well) include:
IBM is installing WCG agent (automatically, as "recommended software" more) to their corporate sites. In a way, one can think of WCG as a "commercially" run / supported DC Framework for multiple projects (currently hosting 3 projects). Which projects receive support gets decided by an advisory board. It is run and maintained by IBM (when one joins, you actually enter in a legal contract with IBM). IBM says control of the project will eventually be transferred to a suitable non-profit entity once the project gets firmly on its feet.
quote from WCG End User License Agreement
"WELCOME TO WORLD COMMUNITY GRID. THIS IS THE AGREEMENT THAT APPLIES TO YOUR PARTICIPATION IN WORLD COMMUNITY GRID. IT IS BETWEEN YOU AND INTERNATIONAL BUSINESS MACHINES CORPORATION (WE WILL REFER TO OURSELVES AS "IBM" OR "WE"). IT IS ANTICIPATED THAT EVENTUALLY IBM WILL TRANSFER WORLD COMMUNITY GRID TO A NON-IBM OWNED NOT-FOR-PROFIT ENTITY. SUCH A TRANSFER WOULD CONSTITUTE AN IBM CONTRIBUTION. IF AND WHEN THAT OCCURS, THIS AGREEMENT WILL ALSO APPLY TO THAT ENTITY."
WCG is available as a BOINC project, and now one can also fine-tune CPU resources or opt-out of its individual sub-projects (I don't mind, both HPF and FightAIDS are equally good imo). Normally, UD/WCG and BOINC/WCG users will get work-units of both projects automatically.
grid.org stats: 1.3million people, 3.5 million CPUs (but I wonder, how many of them are active?)
"Using Screensaver Lifesaver / LigandFit technology, we have identified three promising leads Mol597, Mol238 and Mol628 for urokinase inhibition. Based on the docking studies these molecules show high potential as uPA inhibitors. These molecules can be used as lead molecules for the design of better uPA inhibitors as potential anti-cancer therapeutic agents. We are synthesizing these molecules and testing them for biological activity."Emailed feedback from Prof.Richards 30-Nov-05 (source)
The screensaver project has produced a large number of 'hits'; molecules predicted to be potential inhibitors of proteins and possible leads for drug discovery. The bottle neck to exploitation of these predictions is the potentially costly synthesis of the compounds and their biological testing.
For two of the series we have passed that stage and the results are very encouraging.
One of the more interesting and challenging targets is the phosphatase.
Much of biology is a balance between phosphatases and kinases. The latter have been important commercial targets for anti-cancer drugs for many years, but the phosphatases have proved more difficult as for one thing the biding site into which the drug must bind is rather ill-defined. The project produced some 128,000 hits for this target. We analysed the results and produced a list of 400 good hits of differing chemical types. These were synthesized and tested. Over 40 [~10%] proved to be genuine inhibitors which is very good by industry standards and what is more they are uncharged molecules which are very different from known inhibitors. We are trying to find pharmaceutical companies to take this further.
For a second series where the target is urokinase plasminogen activator [implicated in prostate cancer] we have again had hits synthesized and tested, this time by collaborators at the Arizona Cancer Center. This too has produced encouraging novel active compounds . These results were presented at the American Association for Cancer Research and a poster giving details is being put on the Oxford Chemistry web site
"We have also targetted most of the recognised protein targets for the major project areas. [...] Our experience suggests that it will be difficult to find collaborators who will be interested in the results of targetting proteins whose biological function is unknown or of little therapeutic interest." -Keith Davies, FADThe first phase of the internet based anti-cancer computing project "Screensaver Lifesaver" aka CureCancer, used the THINK science software and was hosted by United Devices. In collaboration with the National Foundation for Cancer Research (NFCR), Oxford University and Intel Corporation. The second phase of the NFCR+Oxford+UD Cure-Cancer/Screensaver-Lifesaver project chose to use the LIGANDFIT software. At that point, Keith Davies, who is an honorary research fellow at the University and is the founder of Treweren Consultants which developed the THINK software, established Find-a-Drug to continue research using the THINK software.
Note: The stand-alone projects mentioned above are just the most widely followed (over 100.000 participants). There are many more "stand-alone" distributed computing projects available. You can find more at distributedcomputing.info and www.volunteerathome.com sites, with extensive, helpful project profiles. It also explains BOINC, WCG and UD. Hopefully more will become available under the BOINC platform in the future, which manages them to achieve optimal performance.
Still, trends probably indicate that the "donating user" community is focused on more "technical" projects i.e. SETI+CPDN+Einstein+LHC consume about 90% of overall BOINC CPU time. I think one reason is that most "life science" projects came online via BOINC recently and also a concern about potential exploitation of biomedical research results by BigPharma. Life science projects should try to address these concerns. (read heated discussion at UD-Intel cancer forum and request for feedback at HPF forum, more doubts and answers). Regular feedback is important for "troop morale" (still, plenty of CPU power will go to projects with the prettiest graphics and fancy stats). Personally, I want transparency and full disclosure.
Q. Is Rosetta@Home non-profit? / Is someone going to make money out of my donated computer time?Know who owns the data sets that come out of your processor and read their usage commitments in no-nonsense legal terms. If they don't make it easy to locate those terms then find another project.
A. The Univerity of Washinton makes available the source code (the actual Rosetta algorithm, which our donated computer power is helping improve) available to academics and other universities for free, and will eventually make the source code available to the public, when the possible cheating problem on BOINC is addressed. Dr David Baker "Everything will be public domain ... No, I do not believe in patenting naturally ocurring genes, proteins, etc."
Opportunity cost, e.g. FightAIDS@home - Seems like all the "processing" will be done within a year, and then a long time will be spent looking at the 'hits'. It is just an example of the researchers being the bottleneck, not the CPUs. (valid questions from F@H forum posts)
My choicesHow did I decide which of all BOINC projects to contribute to? Due to most "life science" projects having joined BOINC fairly recently (e.g. Rosetta@home came online out of beta Nov05) they are under-represented in terms of CPU time. Looking at late-2005 stats, about 90% of donated CPU time via BOINC goes to SETI+CPDN+Einstein+LHC. Wrt SETI, it is my opinion that if one wants to look for extra-terrestrial intelligence, one should be primarily examining the UFO evidence, rather than trying to pick up radio signals, because an advanced civilisation might be using all sorts of physics that we don't know about yet, such as quantum entanglement (although certainly radio-astronomy is a great window to the universe).
Anyway, SETI@home has plenty of CPU power already, enough to process every WU 5 times. It's not constrained by available CPU power. At this point SETI@home could use more funds (monetary donations), rather than more CPUs.
Wrt physical sciences projects, I think understanding gravity will allow humanity its next big leap forward and understanding the universe. So both Einstein and LHC are important.
Human Proteome Folding (WCG/HPF) which applies the Rosetta software on the human proteome seems also useful. WCG forums discussion (illuminating). Unfortunately, WCG's (IBM's) current redundancy (initial replication, quorum) settings are just too wasteful.
Rosetta project which is investigating protein 3D shape prediction, has consistently been among the top performing methods for ab initio prediction in recent bi-annual CASP experiments and also does disease related research. Improving those algorithms has a multiplier effect for the progress of other programs (e.g. HPF).
Folding@home is the oldest (since 2000) and biggest life sciences (protein-research) project (still non-BOINC though). But, over the years, I never really understood how the science done at Folding@home would eventually lead to treatments or cures. With Rosetta@home the connection to identifying drug targets and to drug design seemed much more obvious.
So, for the moment I donate my CPU time to
I hope this document will be useful to you,
email: dhatz-dc AT hyper.net
Last modified: Monday, 22-Oct-2007 17:19:03 EEST
PS: I'll also prepare a simple set of instruction about installing BOINC on Linux in a "sandbox" (i.e. in a secure isolated account). Meanwhile, Linux users can also consult Debian Linux installation package and BOINC security
Copyright © 2006-2007
Except where otherwise noted,
content on this site is licensed under the
Creative Commons Attribution 3.0 License