Organizers: Adam Belloum and Zhiming Zhao
Virtual Laboratory for e-Science (VL-e)
in conjunction with the IEEE Int’l Conf. e-Science 2006
As in many e-Science
projects, workflows play an important role in the VL-e projects. Taking the opportunity of having the
e-Science conference 2006 organized in the
The workshop consists of two oral sessions and one panel session. Four of the invited speakers, among other of their achievements, have been very active in the design and/or the development of four well known Workflow Management Systems (WMS), currently used in a number of research projects around the world: Pegasus, Kepler, Tirana, and Taverna.
Three of these systems have been recommended to the VL-e community to be used as part of the, what in the VL-e project is known as, short term solution, as it became clear in the talk of Prof. Adriaans member of VL-e directorate board and research program leader. The VL-e end-users cover a number of scientific domains including: Data-intensive, food-informatics, medical, bio-diversity, bio-informatics, tele-science. Actually, it was not possible to find a unique WMS which can handled all the requirements we have collected from the different VL-e users in the first phase of the VL-e project. We have thus recommended three of these systems which should, in principal, allow them to start right away do interesting research work. The more long term view of the workflow group within the VL-e project is that during the lifetime of the VL-e project, we should provide these users with a more elegant, and generic solution which should increase the re-usability and the knowledge transfer across the six different scientific domains.
The discussion about WMS would not be complete if we will not involve speakers representing the industry point of view; this is why we have also invited two talks from the industry. Unfortunately one of our invited speakers was not able to attend the workshop. Only Mr. Konig, a senior technical Staff from IBM Germany, could join and delivered a very interesting talk about Business Process Execution Language (WS-BPEL) 2.0.
“Meeting the Challenges of Managing Large-Scale Scientific Workflows in Distributed Environments” by Ewa Deelman
Summary—in this talk Ewa Deelman discusses several challenges associated scientific workflow design and management in distributed, heterogeneous environments. Based on a prior work with a number of scientific applications, Ewa Deelman describes the workflow lifecycle and the concept of workflow template from which a number of instances can be created and executed. She also discussed the experiences and the challenges ahead as they pertain to the user experience, planning the workflow execution and managing the execution itself.
Dr. Ewa Deelman Is a Research Team Leader in the Center for Grid Technologies at the USC Information Sciences Institute. She is also a Research Assistant Professor in the Computer Science Department at USC. . Her main area of research is scientific workflow management in Grids. As part of this work she is leading the design and development of the Pegasus software that maps complex application workflows onto distributed resources. Pegasus is being used in a variety of scientific applications.
Summary — Bertram Ludäscher presented his view on scientific workflows as the domain scientist’s way to harness cyberinfrastructure for e-Science. He discussed workflows from different angles: the scientific domain view, e engineering View, and computer Scientists view. Bertram Ludäscher presented the Actor –Oriented Modelling used in the Kepler project. He also presented a number of “Scientific Workflow Design: Challenges” and presented a some way of addressing this challenges such as the semantic annotation, and Collection-Oriented Modelling & Design.
Dr. Bertram Ludaescher is an Associate Professor
at the Department of Computer Science and the Genome Center at the University of California, Davis. He
is also a fellow of the
Summary— Ian Taylor presented the Triana workflow system within the context of the workflow community at large. He provided a brief background for Triana and discusses the ways in which is has been used in the past for serial and as-well-as distributed tasks. He also presented the Triana distributed architecture and key features, being: its user interface and its ability to work simultaneously in heterogeneous distributed environments.
Dr. Ian Taylor is the coordinator for the
Triana project. His research and implementing
artificial-neural-network types for the determination of musical pitch. He is
the head of the developer team of the Triana, he supported initial C++
implementation of Triana, later rewriting it in
Java. He has also contracts for NRL in
Summary— Piter Rice presented the EMBRACE project, a network of European partners providing services which integrate the major data resources and analysis software tools using web services and emerging grid technologies. Piter Rice described the preferred client for these services the Taverna from the myGrid project. He also discussed “What could possibly go wrong?” when the data resources and analysis software starts being used.
Prof. Peter Rice is investigating & advising
on the e-Science & Grid technology requirements of the EMBL-EBI, through
application development plus participation in standards development. actively
contributing to several large scale research collaborations the MyGrid project, the
Summary— Dieter König gave an overview of the WS-BPEL language and shows how it can be used to compose Web services. He provided highlights of WS-BPEL, including structured activities, correlation, compensation, and fault handling. Finally, the OASIS WS-BPEL Technical Committee work, the current status of the standard, and an outlook on follow-on activities is presented.
Dr. Dieter König is a software architect for workflow systems at the IBM Germany Development Laboratory. He joined the laboratory in 1988 and has worked on Resource Measurement Facility for z/OS, MQSeries Workflow, and WebSphere Process Choreographer.
Summary— P. Adriaans gave an overview of the
structure and the
Prof. Pieter Adriaans is professor in machine learning/artificial intelligence at the UvA. He founded Syllogic Systems www.perotsystems.com. He is also advisor of Robosail Systems, a company that manufactures and sells self learning autopilots, senior research advisor for Perot Systems Corporation, and member of the VL-e directorate board. Adriaans is member of the ICGI (International Conference on Grammar Induction) steering committee
Panelists: E. Deelman (ED), D. König (DK), B. Ludaescher (BL) , P. Rice (PR), I. Taylor (IT)
Participations from audience: Carole Gobel (CG), Jeroen Snel (JS), Silvia Olabarriaga (SO), Marian Bubak (MB) …
The panel discussion started by two short presentations given by Zhiming Zhao and Marian Bubak which aimed at raising a number of challenging topics (including provocative statements) for the Panel discussion. Zhiming described the challenging issues form the VL-e point of view, and Marian described the challenges as seen by the e-Science community, he concatenated the list of challenging issues based on the talks presented in the first day of the e-Science conference.
NB: the Following summary is just what we have understood from the discussion, it does not reflect to the word the statement made by the panelists. We do apologize to the panelist and to the audience, if we have misinterpreted some of their statements. We also invite everyone who has participated to the discussion to give us his comment on the following minutes.
■ Low level paradigms?
■ Exploitation of knowledge?
■ Finding something which will enable interoperability of all workflow? Are we going to develop PL1? Some superset of all programming Language?
■ Should we find generic workflow which interfaces to:
● domain ontologies
● computing resources
● provenance system?
PA: What will be the future of e-science workflow management systems in a couple of years? What will be the top three issues to be addressed?
PR: What will be the future of e-science workflow management systems in a couple of years? We will get things working across domain, we already work on bioinformatics domain, and we managed to make work to some cross domain with some tweaking. What will be the top three issues to be addressed
Q from Marian
DK: General comment from industrial perspective, in all domain and product areas, we encountered workflows. It is a common occurring theme which continuously grows, from various areas. We try to drive BPEL standards and all standards related. Considered as SQL analogue from database.
ED: Why people don't use PSE, we notice that scientist still use scripts, and workflow systems promise to relieve them from the pain of scripts. We need to actually deliver the promise of reliable, ease to use workflow.
IT: Q5 is answered by 1, 2, 3, 4, the whole field needs to be defined, and we are still discussing with it.
Q1: Scripting is not the only paradigm; portal should be taken into consideration
Where are we going to be in future: Convergence of technologies various systems that focused on different thing but doing the same thing.
CG: Perhaps we are asking the wrong questions. Scientist cares more of workflow, rather than workflow system. They will care more about the workflow, whatever systems they will use, as long as it does what they want. In the future there will be a pool of workflow, we should be expecting that. If we are successful we will have a lot of that.
DK: What we create is library of workflow; user does not care about underlying system.
BL: How can we motivate scientist to share workflow? Because it means giving away their intellectual properties before they managed to write their paper/get Nobel Price. Promise of workflow, show exactly how you perform experiments, sharing perhaps yes, after they get their results published. Maybe need some mechanisms to recognize who discover the workflow/idea first
ED: Sharing of data, can be done in small circle, large collaborators. Workflow is a good way to share results.
PA: In bioinformatics domain when you publish sequence in a journal, sequence should be available in public, it has been tradition since 1980’s. You have also to explain what you did and how you obtained. If we have mechanism to publish workflow such this will be good.
Jeroen: if we view workflow as sequence of web services calls, how do you share to logic of web services? If you share workflow you should also think about sharing the logic and all information behind it.
BL: Notion of nesting might help to solve these problems. Overall the underlying model that we don’t have now, needs to support that. We need to be able to look inside what kind of services. Distinguish between black box and white box components.
ED: World is not that simple, you don't have control on all components that you are you using. You just have to keep as much information as you can. When it is still non service application components it is easier with services it is more complicated.
CG: Example Biomart, there are no information about the input and output. The logic of services is not exposed by the EBI people. How do you persuade service providers, to expose enough business logic, but not too much, only up to the point where you want other people to know about it.
PA: The issue of workflow is independent from the scientific domain that we are studying, it is more important for experimental/empirical science. Mathematician might not be interested in the workflow?
PR: Give counter example on the color proof of workflow?
DK: What is the right granularity when you use BPEL? Deciding what piece you want to publish what you wanted to hide?
PA: How it is done in business?
DK: Also in Business there are many different domain using workflow. Information of the business logic of workflow is exposed, but company secret logic is not exposed
BL: Some workflow will be computationally intensive, data intensive; nevertheless there are similar components throughout different domain. Analogy that databases are used in many different domains.
SO: I am a user, if I hear a workflow; I don't know what workflow is. I have application; I developed with programming language, Could we anyway see the problem that we will not discuss workflow as workflow, but a big virtual computer where you should program with some specific programming language.
BL: We can learn from programming language, there is no reason not to have taverna script, kepler scripts etc. What is underlying computational model within workflow? Does not always means DAG? If you need loop streaming what would you do? You don't have to go back to full programming other wise you will go back to python.
IT: Workflow has been around for long. Not many people are trained to think in workflow concept. It would take time for people to be able to think in terms of workflow.
What are the Main Issues in the field of workflow for e-science in the next following years?
ED: What we do today when we look at workflow as monolithic systems, we could also see it as high level description specific for applications that can be compiled down to execution, and so forth. In terms of standardization we could do it in intermediate area, (in the middle area). In the high level scientist can have more flexibility.
DK: Agree that there must be layer on top of BPEL to be used for Scientist. Not all scientists must learn BPEL.
BL: Workflow design, workflow design, workflow design. We want to enable scientist to get their ideas in executable environment that other people can use and accelerate science.
PR: These workflows have strong workflow flavor. Grid -> e-Science->workflow. It all comes down to working together and sharing ideas.