Evaluation Plan

2023-09-13 来源：客趣旅游网

IST Project IST-1999-11683

SAFIRA

Supporting Affective Interactions for Real-time

Applications

Deliverable D7.1 Evaluation Plan

Elisabeth André, DFKI Yasmine Arafa, ICSTM Luís Botelho, ADETTI Pedro Figueiredo, ADETTI Patrick Gebhard, DFKI Kristina Höök, SICS Thomas Kulessa, GMD

Abstract: Carlos Martinho, INESC Ana Paiva, INESC Paolo Petta, ÖFAI Pedro Ramos, ADETTI Phoebe Sengers, GMD Marco Vala, INESC

The main goal of workpackage 7 is to evaluate the impact of the affective components on the interaction with the users. Each of the demonstrators will be evaluated in the process of embedding affective components and in the interaction established with users.

Thus, two sets of systems will be evaluated: the SAFRIA toolkit and the three demonstrators. This document contains the overall plan for this evaluation. Keywords Affective interaction, user studies, toolkit evaluation, software engineering

Document Id: Commission Ref.: Classification: Distribution: Date:

D-SAFIRA-WP7-D7.1 D7.1

project&commission project&commission 30th April 2001

IST-1999-IST-11683:D7.1 30 April 2001

Document Id: Commission Ref.: Classification: Distribution: Date:

D-SAFIRA-WP7-D7.1 D7.1

project&commission project&commission 30th April 2001

IST-1999-IST-11683:D7.1 30 April 2001

Evaluation Plan

Deliverable Number: Version

Contractual Date of Delivery to CEC Actual Date of Delivery to CEC Editor

Deliverable Type Deliverable Nature

Workpackages Contributing Partners Contributing

D-SAFIRA-WP7-D7.1 1.0

30 Apr, 2001

WP7 P-Public R–Report

WP2, WP3, WP4, WP5, WP6, WP7

INESC, ADETTI, DFKI, GMD, ICSTM, OFAI, SICS

This document is a Deliverable of the SAFIRA project (“Supporting Affective Interactions for Real-time Applications”), a project performed within the IST programme (project identifier IST-1999-11683). The SAFIRA project is performed by a consortium consisting of the following partner organizations: § § § § § § §

Instituto de Engenharia de Sistemas e Computadores - INESC (P, Coordinating Partner)

Associação para o Desenvolvimento das Telecomunicações e Técnicas de Informática - ADETTI (P) Deutsches Forschungszentrum für Künstliche Intelligenz GmbH - DFKI (D) GMD – Forschungszentrum Informationstechnik GmbH - GMD (D) Imperial College of Science, Technology and Medicine (UK) Austrian Research Institute for Artificial Intelligence - OFAI (A) Swedish Institute of Computer Science - SICS (SE)

Project Contact information: Project Manager: Ana Paiva

Instituto de Engenharia de Sistemas e Computadores - INESC Rua Alves Redol 9, 6 º Esq. P-1000 LISBOA Portugal

Tel: +351 21 3100219 Fax: +351 21 3145843 Email: ana.paiva@inesc.pt

Project home page:

http://gaiva.inesc.pt/safira

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

< THIS PAGE IS LEFT BLANK INTENTIONALLY >

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Executive Summary

This Deliverable (D-SAFIRA-WP7-D7.1) is the first deliverable from WP7 (Evaluation) of the IST-sponsored SAFIRA project. It contains a plan for the evaluation of the toolkit (Task 7.1) and three demonstrators (Task 7.2) to be developed in the project. The toolkit evaluation proposed takes both a general software engineering perspective where each component in the toolkit is evaluated from an efficiency perspective, but also a specific stance with respect to how emotions are modelled and affective interaction enabled. The demonstrator evaluation aims at providing input to design but also to understand how affective interaction influences end user usage.

This Workpackage outlines the timetable for the evaluation, the criteria and measurement that are discussed, and methodological concerns. The overall goal is to provide input to the general understanding of where and when affective interaction will best come into play. The objective of WP7 as per the Technical Annex, p40, was:

The main goal of this workpackage is to evaluate the impact that the affective components will have in the interaction with the users. Each of the applications will be evaluated in the process of embedding affective components and in the interaction established with users. The evaluation of the SAFIRA approach will be done in two ways:

1) by evaluating the framework and toolkit through the analysis of the process of embedding the affective

components in applications; and 2) by evaluating the impact that such affective components have on the developed demonstrators that

utilise affective reasoning and emotion-based interactions. To fulfil the first point we have studied a set of customary software engineering criteria, as well as specific criteria that a real-time affective toolkit must meet. The technical evaluation criteria capture all aspects that are not immediately dependent on the project’s subject domain, but are considered good practice according to current references in software engineering. To follow good practice is crucial for a software package intended for open source distribution, as in the present case. The partners have assembled a set of measurements and means to evaluate the toolkit. To fulfil the second point we have reviewed a set of end user studies looking for measurements and methods. Three sets of experiments have been planned and are discussed in this document. The topics discussed include indications about how evaluation activities are to be carried out during the project, as continuous accompanying measures in the development of the toolkit and as dedicated evaluation sessions for the demonstrators.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

< THIS PAGE IS LEFT BLANK INTENTIONALLY >

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Table of Contents

TASK 7.1: FRAMEWORK AND TOOLKIT EVALUATION PLAN....................................................................8 1.1. SEMANTIC EVALUATION CRITERIA......................................................................................................................9 1.1.1 WP2: Integration...............................................................................................................................................10 1.1.2 WP3: Affective Sensory.....................................................................................................................................11 1.1.3 WP4: Reasoning and Planning with Emotions...........................................................................................14 1.1.4 WP5: Communicating and Expressing Emotions........................................................................................18 1.2. TECHNICAL EVALUATION CRITERIA..................................................................................................................24 1.2.1 Code Quality......................................................................................................................................................24 1.2.2 Performance.......................................................................................................................................................25 1.2.3 Management Support.......................................................................................................................................26 1.2.4 Maintenance Support.......................................................................................................................................26 1.2.5 Extensibility Support........................................................................................................................................27 1.2.6 Documentation..................................................................................................................................................27 1.2.7 Specification of Priorities................................................................................................................................28 1.3. TIME PLAN...............................................................................................................................................................29 2

TASK 7.2: DEMONSTRATOR EVALUATION PLAN.........................................................................................30 2.1. THE ROLE OF AFFECT IN INTERACTION.............................................................................................................31 2.2. PREVIOUS STUDIES..................................................................................................................................................31 2.2.1 Studies of Interactive Characters...................................................................................................................32 2.3. CRITERIA FOR SUCCESS..........................................................................................................................................33 2.4. EVALUATION OF THE INFLUENCING MACHINE................................................................................................35 2.4.1 Method................................................................................................................................................................36 2.4.2 Time Plan............................................................................................................................................................36 2.5. EVALUATION OF FANTASYA................................................................................................................................37 2.5.1 Method................................................................................................................................................................38 2.5.2 Time Plan............................................................................................................................................................39 2.6. EVALUATION OF THE WINE BUTLER.................................................................................................................39 2.6.1 Method................................................................................................................................................................41 2.6.2 Time Plan............................................................................................................................................................41 3 4 4.1. 4.2. 4.3. 4.4. 4.5. 4.6.

REFERENCES...............................................................................................................................................................42 ANNEX I: FANTASYA................................................................................................................................................47 AN INVOLVING CONCEPT......................................................................................................................................47 THE INITIAL SITUATION.......................................................................................................................................47 A FANTASTIC WORLD...........................................................................................................................................47 THE MYSTERIES OF ALKEMHYE.........................................................................................................................48 AGENT INTERACTION............................................................................................................................................49 REFERENCES.............................................................................................................................................................49

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

1 TASK 7.1: FRAMEWORK AND TOOLKIT EVALUATION PLAN

The Technical Annex of the SAFIRA project gives the following overall guiding specification for the evaluation to be carried out as an integral part of the project:

\"The development of the demonstrators will be based on the components (developed in WP3, WP4, and WP5) provided by the affective toolkit (WP2).

In this task, a set of tests will be developed to assess the use of the framework and toolkit by the developers of the demonstrators. This evaluation will be qualitative in nature.\"

One of the aims of SAFIRA is to make a major contribution in advancing from tentative early research in affective computing towards facilitating a broader investigation of actual benefits as a result of the inclusion of explicit coverage of affective aspects of interaction in the design of next-generation applications. It is obviously too early to think of such a task in terms of encompassing standardizations or guidelines of how to map what kind of desired functionality to what combination of affective support functions, or what characteristics these specific kinds of components ought to have. Even so, an important part of the work carried out in SAFIRA regards exactly these kinds of problems, as they have necessarily to be tackled and solved in the design and final integration of the toolkit components and in the development of the demonstrators. It is these valuable experiences and insights – including any possible negative results - that shall be captured and documented in a principled and thus reusable way in the course of Task 7.1.

In order to achieve the goals thus set, this part of the evaluation plan is structured along two main dimensions. A first distinction is made between:

1) semantic or domain dependent evaluation criteria for quality assessment on the one hand, and 2) technical or domain independent evaluation criteria of framework and toolkit component quality

assessment on the other hand. The former are derived from the domain expertise of the single contributors, while the latter are taken from recently compiled and published reference works on the component-based software engineering and real-time software architecture evaluation.

Complementing this basic skeleton, the three roles covered by SAFIRA consortium members developing

software were explicitly considered in the definition of the evaluation plan of Task 7.1. The three groups of relevance in Task 7.1 are thus the following:

The role of the users of the completed demonstrators is covered in Task 7.2, in the second chapter of this deliverable.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

• the group of providers, formed by the developers of toolkit components (WP3, WP4, WP5); • the framework integrators, addressing issues such as interoperability, communication and

coordination within the toolkit and between the toolkit and applications (WP2); • the group of toolkit and component end users, represented by the developers of the various

demonstrators (WP6).

SEMANTIC EVALUATION CRITERIA

The semantic evaluation criteria capture all aspects pertinent to the project’s subject domain of supporting affective interaction in real-time applications to be assessed within the project’s lifetime. As explained in the introduction, these criteria are scheduled to be obtained from the domain experts involved in the project itself. In order to ensure as complete and unbiased a coverage as possible, all contributions were collected under the distinct viewpoints of the roles covered by consortium members, providers, integrators, and end users, so that every feature was addressed from complementing perspectives. The input provided from component providers includes, for each problem or issue covered:

a) a concise specification of the problem or issue addressed; b) a brief characterisation of the current state of the art;

c) a definition of the goal aimed for, e.g., the advance in the state of the art, or the kind of

consolidation of dispersed results; d) the provision of evaluation criteria and/or test cases.

In a similar fashion, the following catalogue of information was scheduled to be gathered in the course of the evaluation task from toolkit component end users at different times of the development of the respective demonstrators:

• before implementation: for each affective functionality required:

a) a concise specification of the affective real-time functionality required in the demonstrator; b) an estimation of the resources to be provided, expected to be required and sufficient for

the realization of the affective functionality using the toolkit. Informally, this information should allow for the tracking of how well the problem statement and the related envisioned solution could be specified beforehand, so as to provide indications for the compilation of advices and insights of general value as part of the evaluation task; c) a characterisation of the kind of guidance and assistance in the actual realization of the

affective real-time functionality expected to be made available as part of the toolkit. Informally, this information should provide a basis for the authoring of usable documentation covering relevant details for each developed component – an essential and notoriously difficult part of the engineering process;

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

d) a description of the expected impact of the realization of the given affective real-time

functionality, as well as the provision of evaluation criteria for actual assessment.

• during/after implementation: for each toolkit functionality actually used:

a) an evaluation of how well the toolkit succeeded in meeting the expectations of provided

functionalities; importantly, including documentation of any regards in which utilization of the framework either failed altogether or exceeded expectations (e.g., by providing richer structuring and support for a phenomenon considered to be one-dimensional at the outset; b) an assessment of the technical, domain-independent properties of the toolkit as detailed in

the following subsection.

Finally, toolkit and component framework integrators were asked to contribute key overarching properties of the collection of functionalities provided.

In the following, we present the semantic evaluation criteria as available at the present stage of the project for workpackages 2-5 in detail. WP6 is covered in the second section of this deliverable.

WP2: Integration

From the integrator’s point of view, a proper semantic treatment of the target domain is ultimately reflected in meeting of desirable criteria of the component catalogue of the framework [Jazajeri 1995]. The component catalogue reflects the overarching properties of the collection of functionalities provided. Four main characteristics have been chosen to be assessed during the qualitative semantic evaluation of this workpackage:

1) Systematic taxonomy of components in the catalogue

For catalogues to be successful, the components ought to support a related set of concepts, so as to facilitate comprehension of the underlying conceptualisation of the domain. The

usefulness of the catalogue depends crucially on whether these concepts are understood and valued by users along with whether the components are implemented well. In short, a

systematic taxonomy makes it possible for the designer of the catalogue on the one hand to decide what components must be included in the catalogue; on the other hand it tells the user of the catalogue whether it may contain the components being looked for. Without a systematic taxonomy, neither the developer nor the user can be sure. 2) Components should be as generic as possible

A catalogue that has fewer components but supports the same functionality is

straightforwardly better than one that has more components: A reduced component count makes it easier for users to find what they need and it makes it easier for component

developers to devote the effort needed to perfect the components. To make it possible to have fewer components implicates that components are reusable in more contexts. This in turn translates to the condition that components make minimal assumptions about the context in which they are used. 3) Components should be as efficient as possible

The requirement of component efficiency addresses not only the minimization of resource

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

utilisation during use, but also the provision of documentation of the performance

characteristics of the components in order to enable potentials users to assess the implications of component use on their whole system design. Clearly, SAFIRA will be placing a strong bias towards this second aspect. Even so, the efficiency of the components developed is planned to be assessed at a qualitative level (cf. also the next subsection on technical evaluation criteria). 4) Catalogues must be comprehensive

In addition to the first requirement of utilization of a consistent taxonomy so as to facilitate orientation of component users, it is of interest to evaluate the comprehensiveness of the sum of functionalities provided. As holds for any design, the design of catalogues requires a trade-off between not including enough and including too much. In the context of SAFIRA, a primary interest will be an assessment of the comprehensiveness in terms of the kinds of applications actually facilitated by the provided components and toolkit. This is to help in the exploitation of results by providing a guideline of what additional functionalities should usefully be included in future extensions of the software after termination of the project proper.

Furthermore, integration requires a set of critical success factors and measurement methods to be set for each component. These have to be adhered to in order to allow for smooth integration in a timely manner and to catch problems as they arise. These factors will serve to establish the stability of the system before release. Separate sets may also be constructed for the demonstrators at a later stage, although they will not be as essential, as these demonstrators are only a proof of concept. These critical success factors that will be assessed continuously during the project in a qualitative manner include stability of standalone components as well as integrated component assemblies; provable conformance with specifications; efficient error handling and recovery; adherence to the high-level abstraction for plug and play capability as specified by the API; functional support for integration APIs; conformance with specified adequate real-time performance. These requirements are discussed in fuller detail in the subsection on technical evaluation criteria.

WP3: Affective Sensory Issues addressed

Task 3.1: Affective Input through Objects

Broadly, the issue addressed here is how the user can engage in an intimate relationship of semi-autonomous control with avatars such as drawing agents or inhabitants of a virtual world: how can emotions of such agents be influenced, while not controlled by the user?

One input component is being built in WP3 for the Influencing Machine demonstrator. It consists of a hardware component that includes postcards with UPC codes and a UPC code reader, and a software module that turns incoming ASCII codes from the code reader into the corresponding emotional influences. The software will provide a mapping function forming an important step in the chain of causing emotion influences to affect behaviour. The related main question to be evaluated consequently is to assess how well this software succeeds - in the context of specially designed hardware, and an emotional model that responds to it – in letting the user influence the avatar’s emotions, and through the emotions its behaviour. Given that evaluation issues for the emotional input system will be examined from this perspective, these

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

will possibly turn out to be more appropriately dealt with in the context of whole demonstrator evaluation, rather than the toolkit evaluation.

Another input component being built in WP3 is EToy, whose goal is to acquire information through the user’s manipulation of the physical object used to interact with the system: a puppet endowed with a set of sensors incorporated inside its body. These sensors are distributed over in such a way as to allow the acquisition of standard movements of the limbs as well as to recognition of specific manipulations (e.g., exertion of pressure) of individual areas. As before, this information must be interpreted, in order to infer what the action induced by the user actually was – by mapping acquired information to a pre-defined set of actions – and how this action was induced, i.e., what the emotional state of the user is likely to be.

Task 3.2: User Models and Emotions

In order to give a computer system the ability to act in a manner that is consistent with the user’s emotional state, it is necessary to provide the means to acquire accurately and reason correctly about these characteristics in form of an affective user model component. This model must represent the set of emotional states that a person can experience in a given application domain, and dispose of the mechanisms appropriate to manage them in a consistent manner.

State of the Art

Avatars may be talked about as though they are simple cyberspace extensions of the user; in the practice of building them, this idea quickly loses relevance. Avatars are often recalcitrant; they don’t do what I tell them, or at least not what I meant. It is not possible to actually construct avatars with persistence over any appreciable period of time, especially those with complex behaviours, without realizing that it takes a great deal of engineering effort to engender the illusion that the avatar is identical to the user. As Bowers, O'Brien and Pycock argue, often a great deal of technical and social effort goes into having the avatar behave non-autonomously, i.e. as a direct and accurate representative of the user. At the same time, several researchers have done innovative work that, rather than attempting to get rid of unwanted autonomy, uses that autonomy as a resource to create new, useful forms of the avatar-user relationship. Hannes Vilhjalmsson and Justine Cassell's pioneering system BodyChat, for example, uses autonomy in the form of body language to support interaction via avatars [Vilhjalmsson & Cassell 1998]. That is, while the user is chatting with other people, their avatars autonomously display the kinds of physical signals humans unconsciously use to support communication, like using glances to show whether or not one is open to communication, raising eyebrows on emphasis words, and using gaze exchange to support turn-taking. These are behaviours which are essential for supporting communication, but of which humans are generally unaware and therefore would find difficult to directly control. While Vilhjalmsson and Cassell's avatar does have some semi-autonomous behaviour, it is still a direct representative of the user - the avatar does what the user would do if s/he could. Interestingly, evaluation of their system suggested that users actually feel more in control of these avatars than ones where they had to directly control the avatar's body movements [Cassell & Vilhjalmsson 1999].

Avatars do not just behave; they also sense the virtual environment for the user. Michael Mateas has developed subjective avatars for interactive fiction, which behave non-autonomously, but have semi-autonomous sensing [Mateas 1997, 1998]. These avatars are intended to help the user feel like a character in a story, by sensing the world in a way that reflects the character's perspective on events,

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

drawing out details that matter to the character and describing them in terms of their impression on that character. The avatar is not simply a representative of the user, but also reflects an author-chosen character.

In the Affective Tigger project [Kirsch, 1999], the main goal was to have a toy that would react emotionally to the emotions of the user. In contrast, the work pursued on EToy is more in the line of that carried out on Sympathetic Interfaces by [Johnson et al. 1999], who also face the problem of allowing a user control a virtual agent that has its own autonomous behaviours. They built a plush toy in the shape of the agent, which the user can move in order to suggest behaviours to the agent; for example, moving the legs of the toy may cause the agent to run. The movements of the toy are interpreted according to the context in which the agent finds itself and which behaviours are now plausible. The agent thus has a great deal of latitude in interpreting the user “commands,” and engages in fully autonomous behaviour when the user does nothing. As in this work, EToy uses a voodoo-doll like metaphor to describe this form of semi-controlling the agent.

Goals of the Components and Evaluation Criteria

The common goal of these components is to allow the user to influence the agent’s emotions in an understandable way over the course of the interaction.

In the Influencing Machine, this occurs in a straightforward way, by mapping each postcard to its emotional “content,” and then sending these influences to affect the emotional model. In this context, the following things are planned to be tested:

1) What kind of difference to the user experience does it make if the user uses physical postcards

and a physical mailbox versus virtual postcards on the screen that are clicked on? 2) The mapping from postcards to emotions is context-free, i.e. the mapper does not keep track of

the history of the interaction. Is this adequate for understandable emotional influences, or would it be markedly better to include narrative, temporal effects in the mapping? 3) Can the users understand that they are influencing emotions, or is this too hard as an interface? In EToy, the types of sensors employed determine what characteristics of the user’s manipulation of the puppet will be taken up and considered by the system. Three aspects of the user’s control of the physical interface are to be sensed: intensity, acceleration, and direction. It is expected that information provided by these three variables will allow the system to predict the action induced by the user with more reliability. This information shall also form a major contribution during the process of inferring the user’s emotional state (cf. the next subsection). In EToy, the assessment of the performance of this dynamic mapping of inputs to outputs thus comes as an additional entry to the three elements of the evaluation plan just mentioned.

In the affective user model component, a first issue regards the definition of a functional set of discrete emotional states the system is enabled to infer as a characterisation of the user’s current affective state. 2

This is probably better evaluated in the context of the whole system.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

This set has to meet the criteria of being efficiently computable with sufficient precision on the one hand, and of being able to properly discriminate between all different states of relevance in the given application domain. Accordingly, for each of the emotional states the eliciting situations that can give rise to a certain emotion must be compiled in terms of the sensing capabilities of the system. Besides emotional states, the user model must also track information about the user’s preferences, goals, and knowledge. These concepts must be defined and represented in the system in a way that is immediately accessible to the domain experts who are to provide these inputs. Evaluation criteria from knowledge acquisition for expert systems area are applicable to this issue. A final difficult issue (cf. Annex I of Deliverable D4.1) is the modelling of the dynamics of the system. How well does the system succeed in keeping in sync with the actual state of the user? How long should the effect of previous inferred emotions be taken in to account for the computation of the user’s current emotional state? How do parameters such as the recency of an obtained result bias their use in the computation? Given that there exist virtually no reported results on this topic to date, this topic will be of particular interest in the evaluation, including in particular any negative results.

WP4: Reasoning and Planning with Emotions Issues addressed

Task 4.2: Appraisal Mechanisms

The appraisal compiler is a software tool used to specify appraisal components. The agent designer specifies the appraisal modules that make up a particular appraisal component, the information required by each appraisal module, and some parameters necessary to create a component such as the component address. The appraisal compiler must generate a component with the appraisal modules specified by the agent designer. The contents of each of the specified appraisal modules must be filled in by the agent designer using JESS. All other internal mechanisms of the component, such as internal mailbox and sending generated emotion signals, are automatically created by the appraisal compiler. The appraisal compiler offers some domain-independent appraisal modules that can be included or modified by the agent designers in their applications: three planning-based appraisal modules and one message loss appraisal module

• Planning-Based Appraisal: The appraisal compiler offers three planning-based appraisal modules

that extend Jonathan Gratch’s interpretation of the OCC appraisal theory [Gratch 1999, 1999b, 2000]. One module detects actions whose execution was ordered by a planner component that failed to produce the expected results. Another module detects actions to be executed by an effector, whose executions were not ordered by a planner component. A third module detects events that threat conditions that are protected by a planner component. • Message-Loss Appraisal: The appraisal compiler offers an appraisal module that detects that a

given component lost or is about to loose messages received in its internal mailbox.

Task 4.3: Decision making, Planning and Emotions

The planner component is to be of use in situations where an agent should be controlled by long-term goals, i.e., by a teleological mechanism. In contrast, in domains in which it is sufficient to merely react to events for which a response can be programmed ahead of time, it is better to use a production system kind of control. In the general case, it is a good idea to have an agent controlled by two kinds of components: a

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

planner component to deal with long-term goals and production system components to deal with local short-term events. The difficulty of using such a hybrid control mechanism in the SAFIRA agent architecture is twofold:

1) If an effector receives two incompatible orders from the planner and from one of the production

systems, which of them will take the precedence? 2) If a production system is an independent component, how can its behaviour be influenced by the

long-term memory system shared across the whole architecture? The planner component receives a goal expression, a description of the current state of the world, and a theory of the domain in the form of an action theory. Based on these inputs, it determines what actions should be executed and when. Important issues in planning include: the occurrence of significant changes of the world state due to exogenous events before completion of a planning cycle; useful employment of correct but incomplete information about the world; support for changes in the agent’s achievement capabilities that result from inter-agent negotiation; and support for actions the effects of which cannot be predicted completely before their execution. Ideally, the planner component should be capable of facing hard problems in demanding situations. In reality, however, the planner component will be developed incrementally employing different solutions in different instantiations. It is therefore to be expected that it will not exhibit all desirable features at the same time. As the requirements mentioned are addressed over the project lifetime, the evaluation task will document how different versions of the component cope with them. All said, it must be noted that the purpose of the present project is not to develop sophisticated planning algorithms, but rather to develop and study agents with emotions.

Task 4.4: Evolution and Development of Emotions

The goal of the emotional development software model is to provide support within an agent’s mind for keeping track of the developmental state of the agent. The software runs as a separate thread, monitoring conditions that may cause the developmental state to change and answering queries about the current developmental state. We are not aware of any convincing demonstration of a general model of development in any intelligent software system. Systems based on self-organization are not adequate for a situation when one wants to be able to control the outcome of development, e.g. to generate development which follows known human development. However, controlling development by hard-coding its stages does not allow different kinds of development to be implemented in the same framework.

Goals of the Components and Evaluation Criteria

The appraisal compiler must be capable to determine whether the described specifications were really implemented and whether they work without errors. Besides, the usefulness of the appraisal compiler to create a component from the specifications given by the agent designer shall be assessed. In this generic use, no specific theory of appraisal is used; therefore agent designers have complete freedom to specify any theory. The only requirement is that the emotion-eliciting conditions to be specified by the agent designer can easily be represented as JESS rules. If agent designers want to use planning-based emotion-eliciting conditions, they may use the planning-based appraisal modules offered by the appraisal compiler. Agent designers are free to modify those modules at will. Although these modules are planning-based modules, it is not strictly necessary that the SAFIRA planner component be used in the agent. In fact, it is not even necessary that the agent have a component that implements a planning

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

algorithm. What is required in any case, is that the agent includes a module with the role of a deliberative controlling system that selects actions to be executed with expected outcomes, and that can inform the appraisal module that given conditions must hold during specified time-intervals. The agent’s effectors must also know which component ordered the execution of what actions. This all means that agent designers may find it useful to use offered planning-based appraisal modules even in cases in which the agent does not have a planner component.

The main goal for the planner is to provide a component in which the influence of emotion in planning arises as the result of generic automatic mechanisms, which are not explicitly specified as part of the theory of the domain to be used by the planning algorithm. That is, the planning operators should not have to include specifications related to the way emotion influences the agent behaviour. The way emotion influences the agent’s cognition and behaviour is specified by two distinct and separate mechanisms: by the specification of appraisal modules, and by the specification of the emotion-responses. General domain-independent emotion-responses are programmed into the agent architecture, independently of the planner. Specific emotion-responses are also programmed into the agent architecture but their execution will create and modify data structures used by the planner. These changes however are not specified as part of the planning operator specification. In the specific case of the SAFIRA planner component, the emotion-responses are represented in Salt & Pepper long-term memory nodes. The influence of emotion in planning is built upon the general mechanism by which emotion influences the workings of long-term memory. The agent designer must specify both the emotion-eliciting conditions and the emotion-responses, but these specifications should be made independently of the planning operator specification. The planner component will be evaluated according to whether it fulfils the specified planning requirements; however, this is not the main purpose of the project. Besides planning abilities, the planner component will be evaluated mainly with respect to the degree to which the influence of emotion in cognition and behaviour does not have to be explicitly represented in the specification of the planning operators. It is also important to evaluate if designers find it useful to have this separation between the specification of emotion and the specification of planning.

The main goal of Task 4.3 is to study the behavioural differences introduced by using emotions in the agent architecture. It is assumed that natural emotions have several important roles, including adaptive ones. A research goal is to show that it is possible and useful to build artificial mechanisms for autonomous agents that can play part of the adaptive roles of natural emotions. In terms of the planner component, the main evaluation criteria are to show the degree to which it is possible to observe the differences between what happens with and without emotions.

Two versions of the planner component will be developed: the Flat-Planner and the Em-Planner. The Flat-Planner will use exactly the same planning algorithm as the Em-Planner, but will not be implemented on top of the Salt & Pepper memory system and will not receive emotion-signals. The Flat-Planner also does not provide any information to appraisal modules. In contrast, the Em-Planner will be the emotional version of the Flat-Planner. It will receive emotion-signals that may give rise to emotion-responses and influence the way the component works, since they affect the accessibility of long-term memory nodes used by the planning algorithm. The planned procedure to evaluate the planner component thus consists of the following steps:

1)Evaluation of the degree of similarity between the inputs given to the Flat-Planner and those given to

the Em-Planner (i.e., state of the world, goal expression, theory of the domain). These inputs should be as similar as possible, since further evaluation will be meaningful only to the extent that the inputs are similar.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

2)Determination of the qualitative differences between the observed performance of an emotional

agent controlled by the Em-Planner and an agent without emotions controlled by the Flat-Planner. In addition, an assessment of quantitative differences may be carried out optionally 3)Determination of the degree to which the current conceptual framework can be used to understand

the observed performance differences. 4)Assessment of the degree to which general conclusions can be abstracted about the usefulness of

using emotions in artificial autonomous agents. 5)Optionally, it may be also useful to compare the performance of an agent with the Em-Planner that

really uses emotions with that of an agent controlled by the Em-Planner, but without the generation of emotions. 6)Given that the memory planner component is not a real time component, it cannot be evaluated in a

real-time context in the strict sense (hard real time). Even so, the runtime performance in terms of e.g. characterisation of the latency between the input data and appraisal answers (=emotion signal generation) shall be assessed (soft real time characteristics). With respect to emotional development, given the paucity of understanding of this area, and the rich complexity of human development, the goal of the component is to provide a lightweight, reusable model of development. It is not intended to be the last answer on development, but to provide a simple, first-draft utility of one way in which development can be modelled, that can successfully be used to drive behavioural change in an agent. We limit our definition of emotional development to a process of moving through a graph of developmental states, either as a result of the passage of time or in consequence of conditions that occur outside of the developmental model. Multiple states may be simultaneously occupied. Associated with each state is a property that holds in this state. These properties are purely abstract, which means that the development component does not “know” anything about these properties; for it, they are only distinct labels. The association of a label with actions in or outside the mind of the agent is done outside the scope of the component. Neither is the developmental model aware of the “meaning” of the conditions that can cause changes to developmental state. Instead, it keeps track of the names of the conditions that currently must be monitored. Actually evaluating those conditions lies again outside the scope of this component. The major evaluation of this component will occur in the construction of the Influencing Machine demonstrator in WP6. The major questions to be answered are as follows:

1) Is it possible to black-box development from the rest of the behaviour architecture in a useful

way? 2) What are the advantages and limitations of this approach from a behaviour programming point of

view? 3) Is the system meaningfully reusable?

Note that this question cannot be answered from one demonstrator alone!

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

WP5: Communicating and Expressing Emotions Issues addressed

Task 5.1: Facial Expressions

The Facial Expression Mechanism (FEM) component encapsulates the mechanisms required to generate real-time expression in an animated face. The task is to produce facial animation based on Ekman’s Facial Action Coding System (FACS) [Ekman & Fresen 1978] where the expressive facial motions are derived from unique combinations of facial muscle groups, identified and classified into action units that can be combined to formulate a comprehensive range of facial expressions. To provide varying degrees of expression intensities, these action units are contracted or retracted altogether, according to the behaviour to be represented. FEM is to provide the mappings from emotional parameters to the parameters of standard facial animation packages. It will utilise readily available facial animation tools to achieve this mapping and generate the affective expressions. The delivered tool shall facilitate the expression of emotion in 2D/3D facial animation by providing a mechanism to define syntactic, semantic and pragmatic character presentation attributes using structured text. The expressive behaviour manifested in a full facial animation complements research on full body representations carried out in Task 5.2.

Task 5.2: Body Expression

The overall goal of the Affective Body Animation component is to provide a simple way of creating and controlling reusable bodies of synthetic characters that are able to express emotions. The body assumes a natural relevance when creating emotionally expressive agents, particularly synthetic characters. Humans, for instance, express their emotions and inner feelings with facial expressions, but also with gestures and behaviours that affect the body movement. Thus, a believable emotionally expressive synthetic character should be able to express its emotions through bodily animation using the appropriate postures and gestures. The first serious challenge is how to affect the synthetic character animations to express emotions. Unless we want to have different versions of an animation for each emotion, which is not very reasonable, we must modify the animations in real-time to convey the desired emotion.

The EMOTE system [Badler et al. 2000] is based on Laban Movement Analysis and can produce a wide range of expressive and natural looking movements. The motion blending is achieved using PaT-Nets [Badler et al. 1993] as abstractions of particular body parts. Each net can use simple interpolations to ensure continuity. However, the resulting characters are limited by the rigid skeleton structure enforced by the underlying geometry. Perlin [Perlin & Goldberg 1996] proposes a three-layer architecture: geometry, animation engine and behaviour engine. The animation engine ensures real-time motion blending with smooth transitions based on the manipulation of geometry abstractions called DOFs (degrees of freedom). Blumberg [Blumberg & Russell 1999] uses a three-layer architecture similar to Perlin’s architecture: geometry, motor system and behaviour system. The motor system is responsible for the atomic movements and ensures appropriate transitions between them. Our work is loosely based on Perlin and Blumberg’s architectures, but it introduces an emotional layer that affects the body posture to express the emotions. The idea is to combine an animation with different postures in real-time to achieve affective body animations.

The second challenge is how to define a control interface independent from the character class, since each character class has specific animations and, sometimes, basic actions such as moving vary from class to

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

class. The Microsoft Agent [Microsoft 1998, Microsoft 1998b] programming interface offers a list of generic methods for character control and each character class can parameterise the method with its own animations. Our work uses the same approach of the Microsoft Agent and, for example, a generic move command can have different animations depending on the character class. Additionally, to enrich the expressiveness of the character classes, there are generic play and stop animation commands to allow class specific animations.

Task 5.3 Affective Rendering

The affective expression component generates child-like drawings based on emotional information. These drawings seem to communicate emotional and personality information. There are two levels at which this component is delivered:

1) A version of the component that takes as input emotions and developmental states, and as output

makes appropriate drawings. 2) A low-level version of the component that takes as input commands to a pen and generates non-photorealistic, real-time strokes as output. Both versions are provided, so that subsequent researchers can try different approaches to modelling children’s drawings than the one we take, and compare the approaches to one another.

The problem the component addresses is how to generate images by computer that communicate a subjective view of the world. By far the majority of computer-generated imagery is photo-realistic, i.e. creates a ‘photograph’ of a 3D, CAD/CAM-like model. Even non-photo-realistic rendering is generally concerned with creating an objective image of such a model, which is portrayed in a painterly, rather than photographic way. But, by far the majority of human-created imagery makes no attempt to be objective; rather, the point of the imagery is to communicate an author-chosen perspective. We would like to generate similar imagery automatically, in a way similar to children: dynamically (i.e. as an observable process in real-time, with emotions being reflected not only in the end-result but also in the dynamic qualities of the strokes which make up the picture), freely (without an input model to be represented), developing over time, expressing the emotional state of the drawer, and, perhaps most importantly, charming.

There are two major drawing programs which act as precedent for our work in this area. The first is Aaron, by Harold Cohen [Cohen 1995, McCorduck 1991]. Aaron is a program that generates drawings by controlling a robot arm. It models an expert artist (i.e. Cohen himself!), and can generate a wide variety of drawings of natural scenes with human figures in Cohen’s (now Aaron’s) signature style.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Figure 0-1: Picture generated by Aaron

While Aaron is certainly inspirational for our work, there are several differences in our approach. The approach we take here is not to model an expert, but to model the kind of drawing anyone (under the age of 5) can do. In this way, our approach to graphics is similar to the kind of change in thought which occurred in AI in the 80’s, when people realized that AI programs could achieve expertise in areas like chess that was very difficult for humans, while not being able to engage in ‘trivial’ tasks like walking across the room or recognizing a human face in different lighting conditions, which any child is capable of doing. From this perspective, our most direct inspiration is Ed Burton’s Rose [Burton 1997a, 1997b], a program that generates childlike drawings of 3D models. The Rose program takes as input a CAD/CAM model, analyses it in a way metaphorically similar to children’s perception, and then produces a childlike drawing of it (Figure 0-2). Rose’s drawings clearly show the level of charm we aim for in our work (Figure 0-3).

Figure 0-2: Screenshot of Rose, showing input 3D model,

Rose’s internal representation, and the resulting picture Rose draws.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Figure 0-3: Picture of a man with a bucket and a cow.

Task 5.4: Affective Speech

A framework that supports affective interactions requires a component for the generation of affective speech in order to express emotions via language. For this reason, the SAFIRA – Toolkit has to provide such a component. Currently, there are a number of components available for the generation of speech, which also allow the specification of (mostly just a few) emotional parameters. An overview is provided online at http://www.dfki.de/imedia/safira/related_research.html. However, to date none of these have been integrated into an affective framework such as SAFIRA’s. As stated in the workpackage description of the Affective Speech Module, use will be made of existing implementations for the text generation and speech synthesis. Currently, the TG/2 text generation system and the Festival speech synthesizer are used. If tools of a better quality should be freely available during the runtime of the project, it should be possible to easily integrate them.

Task 5.5: Affective Inter-Agent Communication

The overall goal of affective inter-agent communication is to have expression of emotion conveyed not only to the user, but also to the other participating software agents in a social multi-agent system. In order for the agents to have a semantic and contextual understanding of the information to be exchanged, they must have some knowledge about the surrounding environment, including affective knowledge about internal and external states.

Goals of the Components and Evaluation Criteria

The facial expressions module is to provide adequate, i.e., sufficiently complete, versatile and performing, dynamic facial generation capabilities. With respect to content coverage, the Facial Action Coding System [Ekman & Fresen 1978] will be used as a reference for evaluation. In terms of the Character Markup Language (CML) employed [André et al. 2001c], issues to be evaluated include:

1) 2) 3)

Whether the script provided by CML is rich enough to allow for adequate parametrisation for the

purposes of emotional behaviour animation; Assessment of the quality (accuracy, effectiveness) of synchronisation of animation with speech; Effectiveness in support of transitions between or blending of individual expressions;

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Versatility of CML syntax to support different animation formats such as MPEG-4, MS Agent,

or Faceworks;

With respect to technical evaluation criteria, such as performance and properties of interfaces and internal design, the procedures detailed in section 0 will be applied. As the information content of facial expression taken in isolation is known to be limited (cf. the pertinent discussion in Deliverable D4.1 [André et al. 2001b], a substantial part of the evaluation of the facial animation component will be carried out in the context of whole demonstrators, as discussed in section 2 of the present document. In particular, with respect to the facial animation tool (probably JOE Face, cf. section 3 in [André et al. 2001c]), evaluation is planned to assess:

1) Effectiveness of the emotional expressions delivered; 2) How recognizable these expressesions are; 3) Efficiency of synchronization techniques employed.

The Affective Body Animation component is to provide a means to create and control bodies of synthetic characters in real-time. The component as to be delivered will provide at least two classes of characters ready to use (the character library). It will receive commands through a socket-based interface. These commands shall allow the user to create and delete bodies from a class present in the character library, perform actions on the bodies and send emotional signals to them. The actions correspond to body animations and movements, while the emotional signals affect the body postures. The component blends the active animations and the actual posture in real-time to achieve natural and expressive character behaviour. The minimal list of emotions to be supported will be the list of basic emotions defined by Ekman [Ekman 1982]: anger, disgust, fear, joy, sadness and surprise. These emotions are selected from the universal facial expressions. Their relevance when used in body expression will be a separate object of evaluation. In the future, the list may be extended to support more emotions that can be expressed by the body. The SAFIRA toolkit is to include an application that is able to preview the real-time blending between animations. It shall offer the designer the possibility of watching the character as if it was already in action. The designer should be able to parameterise the animations and observe the results immediately. Additionally, the application will also help in the creation of complex animations. The designer will be able to define and parameterise sequences of basic animations to achieve more complex animations (e.g. pick an object can be defined as the following sequence of basic animations: look at the object, walk towards the object, stretch the arm and get the object). The resulting library of animations is to be used by the Affective Body Animation module. This component will be used in the demonstrator FantasyA to control the synthetic characters that inhabit FantasyA worlds. Related evaluation issues are discussed in the second part of this deliverable.

The Affective Rendering component focuses on the following extensions over the previous work discussed:

1) The ability to explicitly model and communicate emotion and personality through the drawings. 2) The generation of dynamic drawings, i.e. where not only the output, but also the real-time process

of doing the drawing is child-like.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

3) The elimination of an input model, which is not realistic for children’s art before the age of four or

five. Instead, the ‘content’ of the drawing is driven by the agent’s emotional state. 4) The modelling of development, so that the program is capable of generating different stages of

children’s drawings. The stroke system in Affective Rendering will be evaluated for its ease of use in generating non-photorealistic, dynamic figures as a part of the behavioural drawing system. The major aspects that need to be tested are the following:

1) Is the system reasonably and easily controllable in its use for a behavioural system? 2) Do the pacing and form of the strokes seem lifelike?

3) Can the basic stroke forms be used to communicate emotion, for example through variation of

pressure and colour? The behaviour system of Affective Rendering will largely be evaluated as part of the evaluation of the associated demonstrator. The goal of this evaluation will be to determine whether the behavioural system is effective at displaying a range of chosen emotions, such as: Anger vs. Peace/Calm; Smothered vs. Free; Action-oriented vs. Being-oriented; Uncertain/Insecure vs. Assured; Dependent vs. Independent; Primal unity with world vs. Sense of Self; Control/Stasis/Rigidity vs. Flow/Dynamics; Warm vs. Cold; Happy vs. Sad; Aggressive vs. Passive; Rational vs. Passionate; Introverted/Asocial vs. Extraverted/Reaching out to others; Safe vs. Fearful; Energetic vs. Weak.

The objective of the Affective Speech component is to provide an exemplar of an affective speech generation module and to integrate it into the SAFIRA toolkit. The sample integration should show other researchers how to build their own applications including components for affective text and speech. It is important to note that the evaluation should not assess the quality of the module itself so much as it should validate the flexibility in integration aspects:

1) It should be possible to easily integrate text and speech modules into the SAFIRA framework

following the interface scheme for affective speech generation modules. 2) Other researchers should be able to use text and speech modules without knowledge about the

internal data processing. 3) The internal component architecture should include simple and declarative methods for specifying

domain-specific knowledge for expressing affect via language. As a concrete use case for the Affective Speech Module, the wine demonstrator domain will be employed. Related evaluation issues are discussed in the second part of this deliverable.

Finally, Affective inter-agent communication is to employ the notion of meta-level knowledge representation of affective relations, as annotations of objects being manipulated between the entities of the performing multi-agent system, so as to realise the required integration of contextual information and pure affective content. Evaluation will research not only the degree to which this integration is actually achieved, but also investigate the results of attempts to export the results achieved into standards for agent

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

technologies such as FIPA, in particular with respect to agent communication languages and communication policies. Specific assessments of this part of the evaluation plan include:

1) Clarification of the role of emotion in inter-agent communication;

2) Adequacy of the proposed introduction of emotion indicators on the intentional (as opposed to

content) level of communication acts and impact on efficacy of conveying emotional state and on the conversational process as a whole; 3) Effectiveness of agent profiling and definition of agent roles in terms of determination and

influence on agent’s emotional states.

TECHNICAL EVALUATION CRITERIA

The technical evaluation criteria capture all aspects that are not immediately dependent on the project’s subject domain, but are considered good practice according to current references in software engineering. This information is considered to be of particular relevance also so as to insure the exploitability of the results produced in SAFIRA after termination of the project, as explicit documentation of these dimensions is of straightforward relevance for the practicality of the use of the component framework and toolkit by third parties (cf. discussion of WP2 in section 0). Clearly, at the same time the possibilities to adhere to the whole range of candidates identified explicitly in the course of the preparation of the evaluation plan and listed in the following are constrained quite severely by the limited resources and the given main focus of the SAFIRA project on the application domain.

As for the semantic evaluation criteria discussed before, inputs for this list of evaluation topics was elicited from and verified by the consortium members according to their membership to the three roles, providers, integrators and end users. The resulting list of technical assessment criteria was additionally verified with respect to good practice according to current references in software engineering [Brown et al. 1995, Kazman 1995, Abowd et al. 1996, Klein & Kazman 1999, Bachman et al. 2000]. While the items in this section are of relevance for all parties involved in the software production process, it is the interaction between providers, integrators and end users on these topics that leads to the particular distribution that reflects the priorities within the SAFIRA project, which necessarily remain subordinate to the semantic criteria discussed previously. The degree to which individual entries of this list will finally be covered in the evaluation reports will depend substantially on the insights gained during development. With this due caveats, we document in the following the overall list of technical evaluation criteria agreed upon by all consortium partners, grouped in six categories. This list is completed with a subsequent specification of the relative priorities, as estimated at the present stage of the project.

Code Quality

Code quality automatically becomes of particular importance if maintainability moves up in the list of overall criteria, as for software packages such as in the present project that are intended for open source distribution. At the same time, trade-offs because of resource and time constraints and heterogeneity of the topical project consortium cannot be ignored. Taking all these aspects into account, the following items are planned as a practical compromise:

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

1) Code verification:

every piece of code produced should come along with the means to assess its conformance to documented requirements (see also 0); 2) Code quality:

while comprehensive detailed assessment of code quality is clearly not feasible, some qualitative measures reflecting the judgement of the developers themselves as well as a grading by the consortium as a whole ought to be utilized. Such an assessed might be given on multiple scales, reflecting code maturation (developer, alpha, beta, final); comprehensiveness (prototype, minimal, typical, comprehensive); portability; etc.; 3) Style coherency:

coherent employment of abstractions ought to be explicitly verifiable and verified, e.g. across components sharing communication channels. On the other hand, adherence to common global style guides will not be pursued.

Performance

Performance criteria form the second group of evaluation items. This category is of particular relevance given that SAFIRA explicitly addresses real-time issues. The entries in this category were chosen as follows:

1) System requirements:

focus on those requirements with significant impact on performance: these effects ought to be documented along with their distribution among components (e.g., whether some useful subset functionality might be achieved with limited premises given) and with an assessment of their

criticality. Requirements and impacts of particular performance drivers, such as specific choices of hardware; compiler and runtime systems; COTS (commercial off-the self) or more generically any required components not developed by the consortium; and integration costs-such as benefits or penalties incurred by specific combinations of hardware, compiler and COTS shall be documented where available; 2) System architecture and design

aspects of system architecture and design of particular relevance for performance control include the particular choice of scheduling model and available means to affect prioritising of different entities; another important aspect regards the management of shared resources such as communication and I/O channels, memory or CPU time; 3) Real-time performance

as already mentioned, SAFIRA explicitly targets real-time application domains; to that end, every component ought to provide a proper analysis of pertinent real-time constraints, covering a specification of the kind of constraints (hard or soft real-time) incurred, along with whatever information that might be of relevance for applicable schedulability analyses and practical

estimates of achievable performance. With respect to resource consumption, impacts on CPU, memory, network, devices and sensors ought to be considered. Complementing issues mentioned in the previously discussed entries, different possibilities of resource arbitration (offline; online with fixed or dynamic priorities; queuing policies; pre-emption mechanisms; etc.) employed will be

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

documented. Crucial information about real-time performance includes characterisation of

response behaviour in terms of predictability (description of fundamental behaviour type); latency (e.g. typical and guaranteed response windows; criticalities; best, average and worst cases; delay jitter); throughput (e.g., response windows; criticalities; case type analysis); and precedence (criticality; partial or total ordering; etc.). A final measure regards performance headroom, e.g. with respect to peaks of data (in particular, typical start-up bursts); flow control; load growth; and with respect to individual system resources;

4) Performance under abnormal conditions

this item includes fault tolerance as well as explicit handling of anomalous circumstances, in particular: error detection; error reporting; error behaviour; default handlers; recovery (from overflows, lost connections, etc.); and system stability; 5) Performance tests

while these can be viewed as implicitly covered by all of the previous entries in this category, the definition and documentation of explicit test cases (specific, domain dependent and generic,

independent ones) is to be evaluated explicitly. On the other hand, the provision of test generators; and the supply with evaluation guidelines and tools lie outside the scope of the project in a narrow sense, but will be documented where available;

Management Support

Management support is of particular importance for facilities offering wide ranges of adaptability or requiring tuning set-up before or during utilization; error identification and repair; or maintenance of deployed systems (as opposed to maintenance of the software developed, addressed in the next category). The characteristics to be evaluated from this category are:

1) Configurability

e.g. in terms of provision of storage and reloading of parameter settings;

2) Snapshots

the capability to capture runtime configurations and resume operation from given situations at arbitrary times; 3) Logging and Monitoring

availability of facilities for tracing and supervising of different aspects of runtime behaviour.

Maintenance Support

Support for maintainability of systems built up out of a given component framework or toolkit, considered as distinct from full-blown development environments, comes in different guises. On the one hand, major design decisions reflected in high-level model choices can have significant impact, as can lower-level decisions with regard to modularity; encapsulation; information hiding; etc. On the other hand, maintenance support explicitly considered and covered in the accompanying documentation will be assessed (see also 0).

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Extensibility Support

The fifth category comprises support for extension of the component framework and toolkit. Similarly to the previous point, there exist high-level design decisions that intrinsically favour extensibility, including the kinds of existing inter-component dependencies; or the generality and extensibility of the adopted overall design model. At a lower level, reusability of component interfaces plays a role of analogous import. Another important aspect at the micro-level is the documentation of extra-functional component properties, such as characterisation of behaviour; synchronisation; or quality of service provided (cf. 0).

Documentation

This final category is planned to complement the ones hitherto discussed by virtue of offering a structure according to the intended user- or readership, as follows:

1) The architecture document

This standard document shall provide a concise summary of all relevant aspects of the artefact at hand. A partial list of typical views includes the design model as a representation of the

application domain, encompassing a brief description of the major features and requirements with respect to hardware and software; a specification of assumptions and constraints of the approach taken; and a concise high-level data flow and process diagram. The specification model covers functional requirements; external interfaces of the system to its environment; and externally observable behaviour. The object model provides more detailed information about the

decomposition of the domain, which the functional model complements by providing a view of the uses of the defined entities. The dynamic model, again of particular relevance for real-time

systems, provides information about timings; frequencies; latencies; and related characteristics (cf. 0), which can be completed by an additional compilation of performance parameters such as particular sensitivities or documentation of tradeoffs considered; 2) Documentation for end-users

This documentation set addresses the needs of toolkit users. Of particular relevance for members of these groups are structural descriptions differentiating between passive classes of data

objects; active classes of possibly concurrently executing objects requiring coordination; protected classes managing exclusive access to data or other resources; etc. Behaviour descriptions detail the purposes and functionalities of single entities and assemblies; they describe the nature of

events processed, document the way they are handled and prescribe the sequences needed for the generation of outputs; this may be usefully supplemented with a specification of resources required for execution and particular sensitivities. Finally, as also explained under 0, this documentation ought to cover any existing facilities for self-testing and monitoring of components and assemblies and describe applicable methods of verification and validation, in particular of relevant real-time aspects; 3) Documentation for maintainers

In order to satisfy the needs of this target group, the information described in the previous item ought to be extended by providing additional support for problem fixing, e.g., for recovery from operating errors or system failures; adaptation to longer-lasting specific environmental set-ups as wells as to circumstances of limited duration such as transitory unavailability of some resources; and for system evolution, both quantitatively–such as handling of increasing loads– and

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

qualitatively, e.g. integration of new components of either the framework or external COTS elements;

4) Documentation for developers (providers and integrators)

This class forms the final extension of documentation, providing additional information such as elements of code analysis; discussion of interesting directions and appropriate strategies for extension of the given code, architectural coverage or even the very architecture itself.

Specification of Priorities

Within the SAFIRA consortium, it is agreed that the toolkit under development is not a general, comprehensive set of straightforwardly reusable, commercial-quality components covering all aspects of real-time emotion. As already explained, the toolkit cannot possibly raise all these claims, already on the grounds of the project’s limited resources alone. Adding to that come the qualification profiles of the consortium members, all of which are academic research institutions. To reiterate, SAFIRA is not a software engineering project, but a research project at the cutting edge of what is understood about affective computing, i.e. not a project cleaning up, standardizing, and making available well-understood components.

SAFIRA considers its primary product, the toolkit, as a set of code that comes out of the research projects of the consortium members and holds much potential for others wishing to do research in the same area. It provides support for a developing research and development community in the affective computing area in the EU. It is for others to build upon this work to further research and to create commercial implementations, from which it obviously follows that the code will conform to technical criteria as the one given in the present section if with the due caveats here expressed: Consequently, the toolkit will be considered a success if:

• The code is correct and reasonably efficient.

• The code has real-time performance needed to support the demonstrators.

• The code is commented, readable, and generates reasonable Javadoc documentation. • The code catches obvious errors.

• Documentation is provided about how the code works and how to use it.

• The code could plausibly be useful for third parties, though they may need to rewrite parts if the

domain is fairly different. To explain further, aspects such as code verification are considered as essential; in the same category, aims are for acceptable code quality, while style coherency is excluded from the beginning because of the logistical requirements involved that cannot be matched within the present project. Similarly, even if extremely good performance is not the goal of the project, performance adequate for the support of systems such as the demonstrators is considered to be essential. Fault tolerance should e.g. include the capability to deal gracefully with likely errors in XML Schema input.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Support for management, maintenance and extensibility are not considered high priorities of SAFIRA, but will be assessed in any case. A reference for the level of documentation to be aimed for is chosen with the aim of production of good Javadoc documentation, accompanied by a description of the “concept” of each module (comprising state of the art and what the module is intended to provide -as in semantic criteria in section 0), documentation of how each module works (in terms of algorithms and classes), and use case information.

TIME PLAN

An accompanying measure throughout the software development process, the evaluation of the software framework and toolkit will be carried out in a closely integrated fashion with the work carried out in the respective workpackages. All issues specified in the present evaluation plan will be taken into account as appropriate in the definition of single work phases and in the documentation of results. These outputs will be consolidated in the forthcoming deliverables of WP 7.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

TASK 7.2: DEMONSTRATOR EVALUATION PLAN

Three different demonstrators are being developed in SAFIRA: the Influencing Machine, FantasyA, and the Wine Butler. They illustrate three very different aspects of affective interaction.

The Influencing Machine explores the possibilities for an intimate relationship between humans and computational ‘beings’. A human user enters a room, onto whose walls are projected child-like drawings, which are being drawn in real time. Postcards with pictures from art history lie scattered in the room. Users can send the agent postcards by putting them in a mailbox. The picture on the postcard affects the agent’s emotional state, which in turn changes the pictures that are being drawn. The agent goes through a development where the pictures drawn develops from those that could be drawn by a 9 month old child, to those by a 5 year old.

In FantasyA the user plays a role-playing game. The different agents in the game have emotional states. To advance in the game, the user has to understand the different agents’ emotional states, get magical tools, and then fight in magical fights. The agents are embodied in 3D. The user has a plush toy with sensors that s/he can use to direct his/her character and express its emotions.

The Wine butler will be selling wine. The agent will use affective facial, speech and body expressions. It will be modelled upon a real, human, wine selling person and will attempt to be believable.

All three demonstrators have emotion models, and all three express emotions to the user, but in quite different ways. In the Influencing machine the emotions are expressed through child-like drawings, and the relationship between input of the postcards, emotion state, and drawing is fuzzy and intriguing. In the FantasyA scenario, the user’s task is both to master the plush toy in order to create the appropriate emotion expression, as well as understanding other agents’ emotion expressions in order to advance in the game. Finally, in the Wine butler scenario, face and speech are going to reflect the agent’s emotional state, and establish a seller-buyer relationship with the user. The aim is to make the user relate to the agent in a positive way.

One obvious challenge for workpackage 7 is to find ways of checking whether the expressed emotions are understood by users and whether the system can understand user emotions. More challenging is whether these quite unusual usage scenarios are engaging, fun, mind-boggling, innovative, and how much of this can be attributed to the emotion modelling and expression. Our intention is both to contribute to the general understanding of how to design for affective interaction, but also to contribute specific feedback into the design cycles of the three different demonstrators.

Let us start by providing an introduction to field of affective interaction and the hypothesis and claims made (see also deliverable 4.1, André et al, 2001 from the SAFIRA project). We then turn to a small literature summary of the few user studies done of affective interfaces. From these we extract some criteria that need to be taken into account when making studies of affective systems. Finally, we outline a plan for evaluation of each of the demonstrators.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

THE ROLE OF AFFECT IN INTERACTION

Before we can discuss what affect contributes with in interaction, we need to make a couple of notes on fuzzy terms that make readings in this area quite difficult. First of all, the term affective refers to something more than emotions, including personality factors and mood. What the terms emotion, personality or mood exactly means is not really agreed upon in this field. Some define emotions as a function that evaluates different alternative actions and makes it possible for an agent to choose between them. Others define emotions to be the innate, unconscious, basic emotions such as fear, sadness or happiness, and are exploring the effects of empathy. Others, again, try to package and understand the whole complex interaction between basic, innate emotions, bodily, somatic reactions, and higher-level cognitive reasoning about emotions and the relationship back and forth between the two. In deliverable 4.1 of the SAFIRA project a survey of the existing theories of emotions is provided [André et al., 2001b]. Affective interaction in turn, is still a quite undefined term, including various aspects of using affect in the interaction between user and system. In Picard’s book “Affective Computing” [Picard 1997], it included creating systems that: raise emotions in users, model users’ emotional states, or actually ‘possess’ emotions and behaves as such. The focus was on how to utilise our understanding of emotions in order to produce better human-computer interaction, an interaction that would take user emotions into account. Paola Rizzo [Rizzo 1999] provides an interesting analysis of a number of (implicit) hypotheses made in the affective interaction field, and convincingly shows that there is still much room for innovation that does not necessarily rely upon creating believable characters. It is not only through the use of cartoons that use facial-, bodily- or voice-based affective expressions that a system will be able to engage the user in an affective interaction. This might very well be accomplished through other means – especially as people seem so willing to ascribe human characteristics to any system [Reeves and Nass 1996].

Rizzo brings up a number of other interesting issues, among them the fact that empathy is not the only kind of relationship that can exist between user and character. We sometimes laugh at a character that falls over. In general, her analysis opens up a whole range of possible affective interaction scenarios – several of them connecting emotions to a larger narrative context in which they are given their meaning and can be understood. The fields of affective interaction and narrative intelligence/interaction have a lot in common. In SAFIRA, we investigate several different metaphors for interaction with affective interfaces through the demonstrators. The user studies of these, described below, will increase our understanding of whether and how users understand how to interact “affectively”. Already existing interaction metaphors developed by the research community were discussed in deliverable 4.1, and included: Agents for help and learning, Agents for delegation, The subjective focus, The dialogue partner, Emotional behaviour and Agents as user representatives.

PREVIOUS STUDIES

There are very few user studies of the short-term and even fewer of the long-term effects of affective interaction. On the other hand, designers of artefacts, artists, musicians, writers, people in advertising, and more recently web-designers and game designers, have played around with evoking emotions for ages. What differs here is the interaction between the artefact aimed at raising emotions or expressing emotions and the viewer/listener/reader/user reactions and actions at the interface. The user will be involved in the loop in a more active manner.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Some of the work in SAFIRA is focused on implementing affective interaction through interactive characters, but affective interaction may also be realised in various other ways. In deliverable 4.1 [André et al 2001], a survey of studies of interactive characters is provided, also inserted below. In many affective interaction scenarios (besides interactive characters), the goal is to entertain. The HCI community has only recently started to debate how to take those characteristics into account when performing usability studies or providing input to design. These aspects are sometimes referred to as hedonic usability factors

[Hassenzahl et al. 2000] or pleasure-based human factors. Recently, the entertainment effects of being an active participant have been debated in the HCI community [Pinhanez et al. 2001, Karat et al. 2001]. Early results show that a number of different interaction design schemes are needed – some with more active interaction (games, MUDs, chats), some more passive (web entertainment) – if we want to entertain. We might want to differentiate between entertainment and play. In both, affect and emotions come into play, often as a means to build relationships.

Studies of Interactive Characters

Interface characters have been much criticised and debated in the HCI community [Shneiderman 1997, Lanier 1996, Suchman 1997]. They are said to violate good usability principles, to obscure the line of responsibility between human and machine, and to confuse both designers’ and users’ understanding of the computer’s abilities and inner models of events. The proponents, on the other hand, regard these parameters as opportunities rather than reasons to avoid characters in the interface [Höök 2000, Waern and Höök, 2000].

Motivational Effects

A number of studies have examined the ways in which characters enhance engagement and encourages exploration of a given information space, mostly in relation to learning and creativity. Such motivational effects were studied by van Mulken and colleagues [Mulken et al. 1998], who compared two versions of the presentation system PPP Persona: one with and one without a character. The study showed effects neither on recall of the presentation, nor on how the presentation was understood (objective measurements). However, it revealed a positive effect on the subjective estimation of whether the explanation was difficult or not. Subjects experienced the explanation as simpler with the PPP Persona character than without it. Van Mulken and colleagues named this ‘the persona effect’.

Another similar study looked at the persona effect for ‘Herman the Bug’, a pedagogical agent that helps students to create an ecological micro world system with plants, light and air [Lester et al. 1997]. Here five different clones of the agent were compared, and the study revealed a persona effect – a strong positive effect on the students’ perception of their learning experience. The animated character also had an effect on learning.

In a study by Wright et al. [Wright et al. 1998] a plain textual explanation of a medicine was compared to one with the same text but with an animated dragon illustrating the different threats to the blood system. 4

A workshop on Pleasure-Based Human Factors was held in Copenhagen April 2000, the proceedings will be published in a forthcoming book. The workshop is now held in Singapore under the name Affective Human Factors Design. http://www.unimas.my/cahd2001

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Here a negative effect on how much was remembered afterwards appeared; the dragon disturbed subjects, rather than aided them.

These conflicting results (PPP Persona and Herman the Bug, versus the dragon studies), point to the need for a better understanding of the design of synthetic characters in order to make use of their potential to encourage learning and exploration, and at the same time avoid the scenario in which the character distracts and disturbs the learning process. This involves, we think, a better understanding of the features of and relationship between wayfinding and exploration activities.

As pointed out by Andrew Stern [Hayes-Roth et al. 1998], the designer of the Catz and Dogz system, the artistic design and practical understanding of the creating of synthetic characters is crucial in determining the success of a system. A similar point is made in [Elliott & Brzeinski 1998] when they cite [Lester et al. 1997]:

”Lester gives the examples of, on the one hand, a humorous, lifelike, joke-cracking, character that ultimately impedes problem solving through his distracting presence; and on the other, a dull assistant that always operates appropriately but yet fails to engage the student. When

communications from an agent must be coordinated to be both engaging and purposeful issues in timing, and the multi-layering of actions arise.”

Anthropomorphic Effects and Believability

Another effect of synthetic characters is the ways in which they tend to raise expectations of anthropomorphism of the system [Reeves & Nass 1996]. Such anthropomorphic effects seem to have many dimensions. On the one hand the user may expect the system to be intelligent and cognitively potent. [Brennan & Ohaeri 1994] showed that users talked more to the anthropomorphic interface. [King and Ohya 1995] showed that users attributed more intelligence to anthropomorphic interfaces. [Koda & Maes 1996] showed that realistic faces are liked and rated as more intelligent than abstract faces. Opponents of synthetic characters argue that raised anthropomorphic expectations may lead to frustration in the user when the system cannot meet the expectations [Shneiderman 1997]. For instance, the presence of a talking face might influence the user to expect the system to possess natural language and dialogue competence, which no system of today can live up to. The general conclusion is that the more ‘natural’ the interface, the higher expectations on intelligence in the system.

CRITERIA FOR SUCCESS

While more traditional user interaction has found their usability criteria (such as efficiency related to users’ tasks, number of errors, learnability) affective interaction still needs to explore which criteria will determine whether the affective aspects of the interaction in fact does contribute to the success of the system. Furthermore, affective interaction systems must, similar to intelligent user interfaces [Höök 1998] be evaluated in two steps. First, we must make sure that the end users understand the emotions expressed by the system, or that the emotions expressed by users are understood by the system. Second, we need to check whether this in fact leads to the desired effects on the overall interaction. It might be that a design of an affective interactive character is perfectly valid, but the facial emotional expressions of the character are hard to interpret. Thus the overall design fails anyway. Or the other way around, the emotional

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

expressions might be easily understood by the user, but the design thus still not achieve its overall goal of entertaining or aiding the user.

Thus, for all three demonstrators we are going to pose three different questions:

1) Do users understand which emotions are expressed by the system, be it through speech, child-like

drawings, facial expression, or bodily behaviour? 2) Does the system correctly interpret users’ emotional state expressed through a plush toy, art post

cards, or the dialogue with the system? 3) If the answer is “yes” to both the previous questions, we can go further and check whether the

overall goals with the affective parts of the system contribute to the interaction? For each of the demonstrators we discuss the overall goals below.

It is not really possible to separate the affective parts of the interaction from the rest of the design, but our task here is to try and focus on aspects of the overall goal that relates to the affective parts of the system. All three demonstrators are semi-autonomous or autonomous agents. For agents interacting with end users, we have the following crucial design problems, see [André et al. 2001b]:

• Agents need to display behaviour and affective expressions in such a manner that the user

understands them. This means that they cannot always act in the most efficient rational way but instead they might have to act in ways that conveys to the user what is going on [Sengers 1998]. This also holds for agents that work in multi-agent systems but where their results have to be communicated to a user in the end. • Agents need to be timely. When an emotion is displayed to the user it has to come at the right

point in time, and last for an appropriate length [Ruttkay et al. 2000]. If an affective response from the user is the aim, then the interaction has to be carefully paced so that the user can follow it without being bored or puzzled. • Agents sometimes need to have interesting personalities. Only then will their emotional behaviour

be comprehensible and interesting to the user. Conveying the personality might be difficult if the interaction with the user is limited. This is where idle behaviour or interaction between several agents can come into play. When several agents interact, they can take the opportunity to show more of their personality traits. • For some affective agent situations, it is necessary to create a narrative context (a situation, an

interaction history) in order to understand the emotional behaviour. • If the agent is used for a longer time span with a user, different personalities and attitudes

might be needed in order to fit the needs of different users [Nass & Gong 2000]. Let us now turn to each of the demonstrators and discuss their specific goals and criteria of success.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

EVALUATION OF THE INFLUENCING MACHINE

In the influencing machine, the three questions from above can be rephrased as:

1) Do users understand which emotions are expressed by the system, through the drawings? Do

users understand that the inner emotional life of the influencing machine develops over time similar to how a child develops? 2) Do the users understand that the postcards in fact represent their means to influence indirectly the

system’s emotional and developmental stage? 3) The main goal of the influencing machine is to entertain, to charm the user, to allow the user to

explore the emotions of the influencing machine as well as their own emotions. For the first question, we might ask some more detailed questions. The goal of the postcard-reading component is to allow the user to influence the agent’s emotions in an understandable way over the course of the interaction. It does this in a straightforward way, by mapping each postcard to its emotional “content,” then sending these influences to affect the emotional model. In this context, the following things are interesting to test:

• What kind of difference to the user experience does it make if the user uses physical postcards

and a physical mailbox versus virtual postcards on the screen that are clicked on? • The mapping from postcards to emotions is context-free, i.e. the mapper does not keep track of

the history of the interaction. Is this adequate for understandable emotional influences, or would it be markedly better to include narrative, temporal effects in the mapping? • Can the users understand that they are influencing emotions, or is this too hard as an interface? For the second question, the goal of this evaluation will be to determine whether the behavioural system is effective at displaying a range of chosen emotions, tentatively the following: Anger vs. Peace/Calm, Smothered vs. Free. Action-oriented vs. Being-oriented, Spiritual vs. Physical/Practical/Material, Uncertain/Insecure vs. Assured, Dependent vs. Independent, Primal unity with world vs. Sense of Self, Control/Stasis/Rigidity vs. Flow/Dynamics, Warm vs. Cold, Happy vs. Sad, Aggressive vs. Passive, Rational vs. Passionate, Introverted/Asocial vs. Extraverted/Reaching out to others, Safe vs. Fearful, and Energetic vs. Weak. The emotions will be expressed both through the contents of the drawings but also through the way they are drawn – the dynamics of the drawings.

Finally, for the last, third question, we are going to both check users’ immediate reactions to the influencing machine as well as their subjective evaluations after using it.

Of the design issues for agents, in particular the issues of timing and narrative context are relevant to the influencing machine. The pace at which the user inputs postcards to the system will matter to the system, the pace at which the drawings are generated will matter to the user’s interpretation of them. The narrative context of the influencing machine is possibly unclear. Thus we are going to test how users interpret the influencing machine, which metaphors they employ to describe their experience of the system [Maglio and Malock 1999, Höök et al. 2000].

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Method

The study will be qualitative and the main focus is in bringing back feedback to design and with some luck feedback on the level of the chosen “metaphor” for interaction.

We shall bring in a group of subjects that together explore the Influencing Machine. Our intention is to bring in say five to eight different groups with two to four people in each. The groups will be varied so that some are old people, some are interested in arts, some know each other, while others are strangers to one another. The reasons we want a group of subjects are:

1) it is closer to a realistic usage situation

2) people in groups will talk to one another and thus we get interesting data

3) people in groups will show more emotional facial expressions than would a single user in front of a

machine (experience from the Agneta & Frida study, [Höök et al. 2000]). We feel that that it is important that the room in which the Influencing Machine (IM) is placed is a calm setting with no other distractions. A high table on which the machine is placed rather than an office desk. No chairs. Postcards scattered around the room. The environment should signal: this is different, this is not an office, this is an arty kind of system. The influencing machine itself will not look as a PC.

The subjects are first interviewed about their attitude towards computers, characters in the interface, arty installation stuff, humour, children, and other demographic information.

They enter the room in groups of two to four people (see above). They interact as they please with the machine (is it always reset to its initial developmental stage before each group enters the room). They continue using it for as long as they please. We videotape them.

Afterwards we ask a set of questions in an open interview form. These questions are carefully designed not to “reveal” the metaphor of the IM – not to talk about the machine as a machine, or a s/he. Subsequently, we are going to study the language they use to look for metaphors.

We then expose them to some portions of the videotapes picturing the drawings being drawn and ask them to classify the “emotion” expressed in the drawing. This will probably also be open-ended and generate discussion.

Time Plan

We are planning to perform the study starting the 1st of June, as soon as the first version of the influencing machine is working. The analysis of the results will be available by the end of August (at the latest).

Though we use the word “subjects” we really look upon them as visitors in an art gallery as well as co-designers providing input to design.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

EVALUATION OF FANTASYA

A description of the game plot of the FantasyA game can be found in Annex I. Basically, the user expresses his/her emotions through enacting them with a plush toy. The user’s avatar then expresses the emotions and this in turn influences the other agents in the game. If the user is able to establish the right relationships, s/he is able to get further in the game, moving through the portals to other islands, establishing relationships with the different characters.

We can divide the evaluation of FantasyA into three main parts: the evaluation of affective control of the characters, the evaluation of the expression of the characters (face, body, and even speech) and the evaluation of the impact that such emotional elements will have in a game scenario, where collaboration is necessary.

The three questions from above can be rephrased for FantasyA in the following manner:

1) Do users understand which emotions the other characters in the game express?

2) Does the system correctly interpret users’ emotional state expressed through the plush toy? 3) Do users have fun and feel involved with FantasyA? Do users form an empathic relationship with

some character in the game? Concerning the first issue, the characters in the game will be having facial and body expressions that will be tested to see if they can convey the emotions we want. Obviously the process of generating the emotional expressions will be done in two phases: one, where the designer creates the character’s behaviours and postures (including the facial expressions) and then the dynamics of the character’s emotions will be generated in real time.

So, we will have to test two elements: the postures and behaviours generated by the designers (do they convey the emotions we want) and the “bend of posture and behaviours” generated in real time (again: do they convey the emotions we want).

Concerning the second issue, the affective input through the plush toy, we fear that it might be difficult for users to “get at” the right relationship between touching the sensors and getting the right emotional state in the users’ avatar. Users might be able to grasp how to grab and move the plush toy if they get immediate feedback through the avatars expression. We aim to test this in several phases, see below.

Finally, concerning the entertainment value of the overall FantasyA game with respect to the affective aspects, there are a couple of concepts that we are curious to explore. One is the sense of flow. Another is the sense of game play. This concept is often used by game designers to denote the feeling unique to computer games that is not achieved solely through beautiful graphics, nor through an interesting plot or narrative, but through a combination of the two together with a certain timing, tempo, and good input devices (such as a joystick) added. If the game keeps getting stuck, the graphics get slow or the plot gets unclear or incoherent, the sense of game play disappears. Our intention is to try and grasp some of the elements of this concept and then check to what extent FantasyA will appeal to users.

Concerning building an emphatic relationship with the user, we first need to discuss what empathy entails (see for example the discussion in [Persson et al. 2000]).

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Method

To check the plush toy input, we divide the evaluation into three, quick-and-dirty, tests:

1st Phase: an evaluation where we videotape people who try to express emotions through a toy without any sensors. The experimenter watches the subject and tries to infer what he/she is trying to express and then returns feedback through setting the avatars emotional state. A Wizard-of-Oz kind of study. The reason for doing it is that it may give us input on various user behaviours with the plush toy and how hard/simple it is for a human being to interpret them. It could also provide some input on the relationship between user behaviour and feedback through how the avatar expresses the emotion.

We would also know how big the span is for individual differences in how users behave with the plush toy or another object (more abstract than a toy). If there are consistencies it will be easier to design the toy's sensors.

2nd Phase: In this phase we plan to evaluate the real toy with the sensors controlling one of the synthetic characters and trying to change its emotional state and its behaviour. Note that this experiment will be quite similar to the previous one, the main difference being that this will be “really controlling” the character.

The subjects will again be videotaped so that we can analyse how well they can control the emotions of the character. This evaluation will also include a questionnaire for subjective evaluation.

3rd Phase: In this phase the same evaluation can be performed but now the control will be over a character inhabiting the FantasyA game. This will be part of the larger study of the FantasyA game. In parallel we shall perform simple studies where users are asked to classify the emotions being expressed by the characters. This type of evaluation will tell us how effective is also the method for bending emotional posture and gesture.

The final evaluation will be done with FantasyA taking into account all the elements already evaluated (affective Input and affective expression). In this final evaluation we will try to answer the following questions (on top of the issues already tested):

• Does the user have fun?

• The sense of presence (the user involvement in the game)? • Can the user communicate with the other characters in the game?

• Can the user understand the emotional state of the other characters in the game? • What is the role of the emotions in capturing the user’s involvement?

• Does the user show empathy with any character? What elements make for that empathic relation? We are currently discussing whether this should be a comparative study (with vs. without emotional expressions) or not.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Time Plan

The first study of the plush toy will be performed during May 2001. The second phase will take place during autumn 2001, October – November. The final evaluation will take place during spring 2002, probably starting in March.

EVALUATION OF THE WINE BUTLER

This demonstrator will deploy a conversational electronic Personal Sales Assistant (e-PSA) embodied in a lifelike animated character. Its character is endowed with the ability to communicate through multiple modalities, which includes written dialogue, spoken language, and facial and body gestures and expressions. Acting as a personal sales assistant, these characters can connect with search engines to find the right product for each customer. They are given the ability to effortlessly answer customers’ questions, at any point in a sales cycle; and also give advice and recommendations. The aim is to establish a business-to-customer relationship, and continue to develop it by providing and maintaining accurate customer profiles, monitoring customer preferences, and shopping patterns and habits. These assistants offer several advantages: personalised information delivery; collaborative marketing and sales; relevant reliable market and customer data. Given the added ability to sense and accommodate Affect they are better able to reason about and relate to personalised interaction and adapt the interface accordingly.

Affect is mainly used to provide believability of the visual character behaviour and the multi-modal expression, which may aid in providing a social context to the overall interaction. Under the assumption that for the e-PSA interface agent to be effective it must not only be context sensitive (by having and acquiring knowledge about products, vendors and moreover the customer), but also to appear believable in terms of the conversational content, information delivery and in terms of the visual and/or spoken behaviour that is related to the perceived mental state.

The e-PSA will take on two roles and is required to provide believable, context sensitive behaviour and responses in a conversational interaction manner.

1) the Helper role where the e-PSA will provide help using the e-commerce system functions and

services within the Wine domain; and 2) a Recommender role where it will have extensive knowledge about the product domain, will

acquire knowledge about customer requirements, likes and dislikes and will be capable of recommending products based its learnt knowledge of the customer. In terms of the three questions from above, the second is not really applicable as the user can only express his/her emotions indirectly through the dialogue. The other two questions can be reformulated as:

1) Do users understand the emotions expressed by the Wine Butler through speech, face, and

gestures? Do they contribute to making the character believable? 2) If the character is believable, will this in turn make customers more loyal and return to the store?

Will it increase (or decrease) trust in the site? Will it be perceived as more fun to shop with the wine butler than without?

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

An underlying assumption in other research in this area, which might not necessarily lead to the best results, is the more natural behaviour, the more believable character – something debated by [Rizzi 2000]. In fact, cartoon-like, overly expressive behaviours might very well be relevant in order to catch interest and make the user stay in the illusion of a believable character.

Let us propose initial questions that may provide indicators to the evaluation objectives set out above. Given answers (or at least indicators) to these questions we are in a better position and are likely to be able to make statements on the following:

• Likeability of or satisfaction from using a character.

• Acceptance of characters as a form of interaction (are they liked or are they considered

irritating?) • Emotional expression: Whether emotions matter? If they do what type of emotions are most

adequate? • What is the impact of the Affective behaviour and reasoning on the quality of service provided? • Degree of frustration: Does the agent affect user frustration in using the system and interaction?

Do users consider these agents annoying? Do users develop other reactions to these characters? • Expectation: To what extent does expectation affect the interaction pattern? How did this vary

with different intensity in the appearance of the characters? • Trust: given a situation, can the system reason about the emotions it is likely to generate, and apply

the same reasoning to situations it is in? What impact may this have on the trustworthiness of the agent? Is such reasoning reliable? • Intelligent behaviour: Do users like agents to act intelligently? Learn about the user? Related to the

user? • What role are such agents best suited for? In what form or shape?

• Are there positive indicators to the usefulness of such agents to online retail applications? One of the objectives here is to determine the impact of agents with Affect. The experiments are aimed to answer a number of questions about the impact of using intelligent agents with affect. A sample of the type of general questions the experiment is set up to answer is:

• Do the users prefer visual agents that show gestures and facial expressions to some extent? • Do users want agents for tasks related to the collection and compilation of personal information? • Do users show interest in employing agents for e-commerce tasks such as buying a provider

service? • Do user want the Agents to be fun to use?

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

• Are users willing to provide the agent with information about the personal political attitude and/or

their financial situation? • Do users want to give the agent information required explicitly by direct input? And/or would the

users like an agent that has learnt through observing the user achieve the task on behalf of the user and/or show the user how the interaction can be achieved more effectively? Etc. We refer to loyalty, here, as being the perception of loyalty by which customers are encouraged to return to an electronic-based retail system and/or the store. Potentially, loyalty will encourage continuous customer return and continuous return may lead to increased sales, which is the objective of online retail stores.

Method

The goal of the evaluation will be to analyse both system and user-related factors, which may result from attributing lifelike characteristics to the ePSA. The evaluation process may be conducted on the four levels following. Further implementation will determine the actual evaluation and test setting and scenarios:

1) Technology comparison: by comparing three identical systems that are equipped with different

interfaces: a) direct manipulation metaphor, b) visual animated interface agent, and c) affect-based interface agent metaphor. Do such interface agents have an impact on the interface and the system? What impact do these agents have? Is this impact respective of the affect-behaviour they can exhibit? Do anthropomorphic attributes promote effective system capabilities 2) Validation of the enabling technology in dynamic setting, evaluating the underlying architecture and

mechanisms. 3) In one sense this is implicitly evaluated through the generation of appropriate emotions governed

by a set of personality traits as evaluated in item three. This will evaluate how reliable inferences are from the defined content language as well as from the proposed communicative acts that can effectively and correctly communicate the appropriate behaviour. 4) Affect system, behaviour and representation

Time Plan

The evaluation will start in mid-October 2001 and be finished by the end of February 2002.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

REFERENCES

[Abowd et al. 1996] Abowd G., Bass L., Clements P., Kazman R., Northrop L., Zaremski A.:

Recommended Best Industrial Practice for Software Architecture Evaluation. Carnegie-Mellon University, Software Engineering Institute, Pittsburgh, PA, CMU/SEI-96-TR-025 (1996) [André et al. 2001] André E., Arafa Y., Botelho L., Farukuoye M., Figueiredo P., Gebhard P., Guerin

F., Höök K., Kamyab K., Kulessa T., Martinho C., Paiva A., Petta P., Ramos P., Sengers P., Staller A., Vala M.: Deliverable D2.1: Component Specification for a Toolkit to Support Affect in Real-time Applications. SAFIRA Project IST-1999-11683 (2001) [André et al., 2001b] André E., Arafa Y., Botelho L., Farukuoye M., Figueiredo P., Gebhard P.,

Guerin F., Höök K., Kulessa T., Martinho C., Paiva A., Petta P., Ramos P., Sengers P., Staller A., Vala M.: Deliverable D4.1: Abstract specification of emotion and appraisal to be used within the framework. SAFIRA Project IST-1999-11683 (2001) [André et al., 2001c] André E., Arafa Y., Gebhard P., Geng W., Kulessa T., Martinho C., Paiva A.,

Sengers P., Vala M.: Deliverable D5.1: Specification of Shell for Emotional Expression. SAFIRA Project IST-1999-11683 (2001) [Bachman et al. 2000] Bachman F., Bass L., Buhman C., Comella-Dorda S., Long F., Robert J.,

Seacord R., Wallnau K.: Volume II: Technical Concepts of Component-Based Software Engineering. Carnegie-Mellon University, Software Engineering Institute, Pittsburgh, PA, CMU/SEI-2000-TR-008 (2000) [Badler et al. 1993] Badler N., Phillips C., Webber B.: Simulating Humans: Computer Graphics

Animation and Control. Oxford University Press (1993) [Badler et al. 2000] Badler N., Zhao L., Costa, M., Vogler C., Schuler W.: Modifying Movement

Manner Using Adverbs. Proc. Workshop 12: Communicative Agents in Intelligent Virtual Environments, Autonomous Agents 2000 (2000) 7-12 [Blumberg & Russell 1999] Blumberg B., Russell K.: Behavior-Friendly Graphics (1999)

[Bowers et al. 1996] Bowers J., O’Brien J., Pycock J.: Practically accomplishing immersion:

Cooperation in and for virtual environments. Proceedings of the ACM 1996 Conference on Computer Supported Cooperative Work (CSCW ’96), ACM Press, Cambridge, MA, (1996) 380-389 [Brennan & Ohaeri 1994] Brennan, S.E., Ohaeri, J.O. Effects of Message Style on Users’ Attributions

toward Agents. Conference Companion, CHI’94, Boston (1994) [Brown et al. 1995] Brown A.W., Carney D.J., Clements P.C., Meyers B.C., Smith D.B., Weiderman

N.H., Wood W.G.: Assessing the Quality of Large, Software-Intensive Systems: A Case Study. Proceedings of 5th European Software Engineering Conference (ESECae95), Sitges, Barcelona, Spain,. 6-10 Sept. 1995 (1995)

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

[Burton 1997] Burton E.: Artificial Innocence: Interactions between the Study of Children's Drawing and

Artificial Intelligence. Leonardo 30(4) (1997) 301-309 [Burton 1997b] Burton E.: Representing representation: artificial intelligence and drawing. Computers &

Art. Intellect Ltd., Exeter (1997) [Cassell & Vilhjalmsson 1999] Cassell J., Vilhjalmsson H.: Fully embodied conversational avatars:

Making communicative behaviors autonomous. Autonomous Agents and Multi-Agent Systems 2(1) (1999) 45-64 [Cohen 1995] Cohen H. The further exploits of AARON, Painter. Stanford Electronic Humanities

Review 4(2): Constructions of the Mind. Updated July 22, 1995 (1995) http://shr.stanford.edu/shreview/4-2/text/cohen.html [Ekman & Fresen 1978] Ekman P., Fresen W.: The Facial Action Coding System. Consulting

Psychologists Press, Palo Alto (1978) [Ekman 1982] Ekman P.: Emotion in the Human Face. Cambridge University Press (1982) [Elliott & Brzesinski 1998] Elliot C., Brzeinski, J.: Autonomous Agents as Synthetic Characters.

Communications of the ACM (1998). [Gratch 1999] Gratch J.: How to make your planner rude and other issues in multi-agent planning.

Information Sciences Institute, CA, Research Report ISI/RR-99-464 (1999) [Gratch 1999b] Gratch J.: Why you should buy an emotional planner. In Velasquez J.D. (ed.): Emotion-Based Agent Architectures (EBAA'99). Workshop at the Third International Conference on Autonomous Agents (Agents '99), Seattle, WA, USA, Saturday, May 1, 1999, ACM Press (1999) 53-60 [Gratch 2000] Gratch, J., Émile: Marshalling Passions in Training and Education. In Sierra C. et al.

(eds.): Proceedings of the Fourth International Conference on Autonomous Agents (Agents 2000), Barcelona, Catalonia, Spain, June 3-7, 2000, ACM Press (2000) 325-332 [Hassenzahl et al 2000] Hassenzahl M., Platz A., Burmester M., Lehner K.: Hedonic and Ergonomic

Quality Aspects Determine a Software's Appeal. Proceedings of CHI’2000. ACM, The Haague (2000) [Hayes-Roth et al. 1998] Hayes-Roth B., Ball G., Lisetti C., Picard R.W., Stern A.: Panel Session:

Affect and Emotion in the User Interface at the Intelligent User Interface Conference (IUI’98), San Francisco. ACM Press (1998) [Höök 1998] Höök K.: Evaluating the Utility and Usability of an Adaptive Hypermedia System. Journal

of Knowledge-Based Systems 10(5) (1998) [Höök 2000] Höök K.: Steps to take before IUIs become real. Journal of Interaction with Computers

12(4) (2000) [Höök et al. 2000] Höök K., Persson P., Sjölinder M.: Evaluating Users' Experience of a Character-Enhanced Information Space. Journal of AI Communications 13(3) (2000) 195-212

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

[Jazajeri 1995] Jazajeri M.: Component Programming - a fresh look at software components. Technical

University of Vienna, Distributed Systems Department, TUV-1841-95-01 (1995) [Johnson et al. 1999] Johnson M.P., Wilson A., Blumberg B., Kline C., Bobick A.: Sympathetic

interfaces. Proceedings of the Conference on Human Factors in Computing Systems (CHI 99), ACM Press (1999) 152-158 [Karat et al. 2001] Karat C.-M., Pinhanez C., Karat J., Arora R., Vergo J.: Less Clicking, More

Watching: Results of the Iterative Design and Evaluation of Entertaining Web Experiences. To appear in Proc. of Interact'2001, Tokyo (July 2001) [Kazman 1995] Kazman R.: AT&T Best Current Practice: Software Architecture Validation. University

of Waterloo, Computer Graphics Lab, Canada (1995) [King & Ohya 1995] King W.J., Ohya J.: The representation of agents: a study of phenomena in virtual

environments. Proc. of the 4th IEEE International Workshop on Robot and Human Communication RO-MAN’95 Tokyo, Japan (1995) [Kirsch 1999] Kirsch D.: The Affective Tigger: a study on the construction of an emotionally reactive

toy. MIT, MSc. Thesis in Media Technology (1999) [Klein & Kazman 1999] Klein M., Kazman R.: Attribute-Based Architectural Styles. Carnegie-Mellon

University, Software Engineering Institute, Pittsburgh, PA, CMU/SEI-99-TR-022 (1999) [Koda & Maes 1996] Koda T., Maes P.: Agents with Faces: The Effects of Personification of Agents.

Proceedings of HCI'96, London, UK (1996) [Lanier 1996] Lanier J.: My problems with agents. Wired (1996)

[Lester et al. 1997] Lester J., Converse S., Stone B., Kahler S., Barlow T.: Animated Pedagogical

Agents and Problem-Solving Effectiveness: A Large-Scale Empirical Evaluation. In Proceedings of the Eighth World Conference on Artificial Intelligence in Education, Amsterdam, the Netherlands. IOS Press (1997) 23-30. [Maglio & Matlock 1999] Maglio P., Matlock T.: The Conceptual Structure of Information Space.

Munro A., Höök K., Benyon D. (eds.): Social Navigation of Information Space. Springer-Verlag (1999) [Mateas 1997] Mateas M.: Computational subjectivity in virtual world avatars. Dautenhahn K. (ed.):

Proceedings of the AAAI-97 Workshop on Socially Intelligent Agents. AAAI Technical Report FS-97-02 (1997) 87-92 [Mateas 1998] Mateas M.: Subjective Avatars. Proceedings of the Second International Conference on

Autonomous Agents. ACM Press (1998) [McCorduck 1991] McCorduck P.: AARON's Code: Meta-Art, Artificial Intelligence, and the Work

of Harold Cohen. W.H. Freeman and Company, New York (1991) [Microsoft 1998] Microsoft Corporation: Microsoft Agent Programming Interface Overview (1998) [Microsoft 1998b] Microsoft Corporation: Programming the Microsoft Agent Control (1998)

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

[Mulken et al. 1998] Mulken S. van, André, E., Müller, J. The Persona Effect: How Substantial Is It?.

Proceedings of HCI'98, Sheffield, UK (1998) 53-66 [Nass & Gong 2000] Nass C., Li G.: Speech interfaces from an evolutionary perspective.

Communications of the ACM 43(9) (2000) 36-43

[Perlin & Goldberg 1996] Perlin K., Goldberg A.: Improv: A System for Scripting Interactive Actors in

Virtual Worlds. Proc. Siggraph 96, ACM Press (1996) 205-216 [Persson et al. 2000] Persson P, Laaksolathi J., Lönnqvist P.: Understanding Socially Intelligent Agents

– A Multi-Layered Phenomenon. AAAI Fall Symposium, Socially Intelligent Agents - The Human in the Loop, North Falmouth, MA. AAAI (2000) [Picard 1997] Picard, R.W.: (1997). Affective Computing. MIT Press (1997)

[Pinhanez et al. 2001] Pinhanez C., Karat C.-M., Vergo J., Karat J., Arora R., Riecken D., Cofino T.:

Can Web Entertainment Be Passive? To appear in Proc. of International World Wide Web 2001 (IWWW'01), Web and Society Track, Hong Kong (May 2001) [Reeves & Nass 1996] Reeves B., Nass C.: The Media Equation: How People Treat Computers,

Television, and New media Like Real People and Places, Cambridge University Press (1996) [Rizzo 1999] Rizzo P.: Emotional Agents for User Entertainment: Discussing the Underlying

Assumptions. International Workshop on Affect in Interactions: Towards a New Generation of Interfaces, 21-22 October 1999, Siena, Italy, AC'99, Annual Conference of the EC I3 Programme (1999) [Rizzi 2000] Rizzi A.: Emotional Agents for User Entertainment: Discussing the Underlying Assumptions.

Paiva A.: (ed) Affective Interactions. Springer-Verlag (2000) [Ruttkay et al., 2000] Ruttkay Z.M. et al.: A facial repertoire for avatars, in Proceedings of the

“Learning to Behave: Interacting Agents” workshop, CELE-Twente Workshops on Natural Language Technology, Twente University, October 2000, (2000) [Sengers 1998] Sengers P.: Anti-Boxology: Agent Design in Cultural Context. Carnegie Mellon

University Department of Computer Science, PhD Thesis (1998) [Shneiderman, 1997] Shneiderman B. Direct Manipulation for Comprehensible, Predictable and

Controllable User Interfaces. Moore J., Edmonds E., Puerta A. (eds.): Proceedings of 1997 International Conference on Intelligent User Interfaces, Orlando, Florida. ACM (1997) [Suchman 1997] Suchman L. A.: From Interactions to Integrations. Howard S., Hammond J.,

Lindegaard G. (eds.): Proceedings of Human-Computer Interaction INTERACT’97. Chapman & Hall (1997) [Vilhjalmsson & Cassell 1998] Vilhjalmsson H., Cassell J.: Bodychat: Autonomous communicative

behaviors in avatars. Proceedings of the Second International Conference on Autonomous Agents. ACM Press (1998) 269-276

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

[Waern & Höök 2000] Waern A., Höök, K.: Interface Agents: A new metaphor for human-computer

interaction and its application to Universal Accessibility. Constantie S. (ed.) User interfaces for all. Lawrence Erlbaum Associates (2000) [Wright et al. 1998] Wright P., Belt S., Lickorish A.: Animation, the fun factor and memory. Monk A.

(ed.): Proceedings of the Computers and Fun workshop in York. British HCI Group (1998)

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

ANNEX I: FANTASYA

Carlos Martinho, Marco Vala, Marco Costa, Rui Prada, Ana Paiva, April 12, 2001

AN INVOLVING CONCEPT

FantasyA is a world guided by the Dreams of seven mighty Gods who, although imprisoned in the godly stones, the KemY’s, use the Dreams to impose their will over the inhabitants of the Land. Through the Dreams, each God chose His IdenjaYeh, His first child and, for Ages, the seven IdenjaYeh have been the uncontested rulers over the land of FantasyA. Each IdenjaYeh preaches a different Path, the Path of his God, the Path of one Element. And around the seven Paths are the seven Clans founded.

Guided by the Gods, the IdenjaYeh built the Covenants, the strongholds of the Clans, symbol of Learning and Power. There, the Magi of each Clan, after being called to the Covenant through the Dreams, are trained in the art of AlkemHYe, the art of Elemental control. But one day, the dreams suddenly ended... and FantasyA was left to its own devices. The IdenjaYeh, predicting a time of Chaos and war across the land, left for the KestYe, a quest to recover their Dreams. Long and cold winters passed, until one day... one IdenjaYeh returned.

He was accompanied by a Menjeh, a man that cannot Dream. The IdenjaYeh ordered the Elders of his Clan to train the Menjeh in the ways of his Element. His final words were: “Send him to me as soon as the Training is over”. Nothing else he added before leaving silently the Covenant... One year passed and the chosen one finally finished the training. After the Initiation Ceremony, the Menjeh leaves the Covenant to search for the Iden-jaYeh. You are this Menjeh.

THE INITIAL SITUATION

The user plays the role of an Apprentice Magus of one of the Seven Clans of FantasyA. He/she has finished his training in AlkemHYe and passed the Initiation Ceremony. He/she is ready to leave the Covenant and look for the IdenjaYeh for further details on his quest.

In game terms, after an initial background explanation of the world and the clans, the user must chose a clan. Then, after assisting to its training, where the laws of YertamenH are explained, and after watching his first transformation in the Initiation Ceremony, the Apprentice leaves the safety of the Covenant to look for the IdenjaYeh. Soon, he/she learns that the IdenjaYeh is beyond one of the Portals of the Island. But which one?

A FANTASTIC WORLD

The Known Universe of FantasyA is composed of several Islands lost in Space and Time, but interconnected by a set of Portals. Portals can be activated by the God Stones (KemY’s) and are the only way to voyage between Islands.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

Scattered across these Islands are Elemental sources. Each source produces a specific stone composed by one specific Element or a combination of Elements.

The closeness to such sources was the main selection criteria for the location of Clan Covenants: the proximity to these source strengthens Elemental magic.

Although Space-Time seems to be broken in FantasyA, seven Moons describe irregular orbits around the different Islands, as if the Moons themselves had learned to use the Portals. Each Moon is associated with a different Element, and according to its position in the sky, also in uences Elemental magic.

AlkemHYe, as a Science, has studied the interaction of those two phenomenon for Eons. Magi quickly learned that duels could be won or lost according to the Elemental strength of the Land (the proximity to the sources) and the Sky (the position of the moons). Magi also discovered that according to the position of the Moons over the horizon, a source will produce somehow predictable variations of the Elemental stones. True Masters will know to use this information well.

THE MYSTERIES OF ALKEMHYE

The training in AlkemHYe provides the Apprentice with the capacity of manipulating KemY’s, the Elemental stones and the physical form of all power in FantasyA. KemYe can be used to enhance one’s magical power (by imbuing them into the Magi belt), to perform defensive or offensive actions (by activating them during YertamenH, the Magi duel), to activate the magical Portals between the Islands of FantasyA, or as a mere trading facility.

Each clan favors a certain Element/KemY. The Initiation Ceremony consists in the attribution of a single pure KemY to the Apprentice, representing the Clan Element, that he imbues in its Magus belt. At this point, the Apprentice will undergo his first transformation, as a direct result of imbuing the Clan Stone in the Magi belt.

The Magus Apprenticeship also introduces to the newbie the concept of YertamenH, the magical duel between Magi. YertamenH is a social procedure in FantasyA, and no Magi can refuse YertamenH, unless he has already been defeated by the challenger and did not challenge him in the meantime (the Code of YertamenH protects weaker Magi). YertamenH ends when one Magus gives up or falls unconscious. The winner can take one KemY from the looser’s possession.

KemY’s will be the resource the user will be looking for as it is the only way to progress through the game: it’s the only way to become more powerful than the opponents, it’s the only way to activate the Portals. However the Apprentice will have to deal with the limits of KemY possession.

First, the Magus has only 3 slots in his Magus belt where he can imbue gems that will change his appearance and powers (additively). The imbued gems define the energy of the magus in each Elemental realm. As the game evolves, the levels will change, according to the undertaken duels, or other activities as deep meditation near an Elemental source (which will raise the levels back towards their maximum values). Apprentices can (and are encouraged to) use other Elements beside the Clan Element. Because some stones are composed of more than one Element, it may be the best option to improve the character. Additionally, only 7 gems can be carried in the Magus bag, to be used offensively or defensively during YertamenH, as trading facility over information or as the key for activating the world Portals.

D-SAFIRA-WP7-D7.1

IST-1999-IST-11683:D7.1 30 April 2001

AGENT INTERACTION

Each character in the story will be implemented as a software agent. The agents can act in the world by moving, manipulating KemY’s, and interacting with other agents (be it trading or duelling).

The only way for an agent to achieve its goals, is to get information and obtain KemY’s. This will be mainly the result of interactions between agents since FantasyA is a limited resource scenario: most of the time, what an agent needs is in the possession of another agent.

Interaction is a process mediated by two fundamental aspects of each agent: the needs and the personality of the agent.

The needs define the required resources to perform the agent interest, replenishment and active-pursuit goals. Fulfilling those needs in useful time will require information (e.g. the location of a KemY or agent). The personality of the agent defines its “trading behaviour”, its eagerness for cooperation or its reluctance towards other synthetic personae, its honesty, its openness towards others, and its belligerency. Friendly agents will gladly trade with you, or reveal the location of another agent which has what you are searching for; belligerent ones will easily provoke you for duel, and the spoils of battle will be the only way to get the key of the next Portal.

Personality will also show through the displayed emotions. Emotions are an important part of the trading process, as they provide information regarding the hidden intentions of the traders.

For more regarding the concept and game maths, refer to “SAFIRA concept” [Martinho 2000]

REFERENCES

[Martinho 2000] Martinho C.: Safira concept. INESC ID, Technical report. (2000)

D-SAFIRA-WP7-D7.1

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

Evaluation Plan