当前位置:首页 >> >>

Sensorimotor Primitives for Programming Robotic Assembly Skills


Sensorimotor Primitives for Programming Robotic Assembly Skills

James Daniel Morrow

Submitted in partial ful?llment of the requirements for the degree of Doctor of Philosophy in Robotics

The Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213-3890

May 1997

Copyright ? 1997 by James Daniel Morrow. All rights reserved.

This Research was supported in part by the Department of Energy Computational Science Fellowship Program, by the Department of Energy Integrated Manufacturing Predoctoral Fellowship, by Sandia National Laboratories, and by The Robotics Institute at Carnegie Mellon University. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the of?cial policies, either expressed or implied, of the funding agencies.

Abstract

This thesis addresses the problem of sensor-based skill composition for robotic assembly tasks. Skills are robust, reactive strategies for executing recurring tasks in our domain. In everyday life, people rely extensively on skills such as walking, climbing stairs, and driving cars; pro?ciency in these skills enables people to develop and robustly execute high-level plans. Unlike people, robots are unskilled -- unable to perform any task without extensive and detailed instructions from a higher-level agent. Building sensor-based, reactive skills is an important step toward realizing robots as ?exible, rapidly-deployable machines. Ef?ciently building skills requires simultaneously reducing robot programming complexity and increasing sensor integration, which are competing and contradictory goals. This thesis attacks the problem through development of sensorimotor primitives to generalize sensor integration, graphical programming environments to facilitate skill composition, and design integration for exploiting the primitive capabilities during task design. Force and vision based sensorimotor primitives are built to provide sensor integration for task domains, not task instances. A task-level graphical programming environment provides an intuitive interface for the designer/programmer to compose the skill through a simple form of virtual demonstration. Finally, design agent representations of the primitives inside the CAD environment interact with the task model and the programmer to assist in composing the skill program. Experimental results for six different assembly tasks demonstrate sensorbased skill execution and illustrate primitive reusability. iii

iv

Acknowledgments

Thanks to my advisor, Pradeep Khosla, for assembling a ?rst-rate research group and facility in the Advanced Mechatronics Laboratory, and for giving me the freedom and support to pursue my own research direction. Thank you also to my committee members Mike Erdmann, Lee Weiss, and Brenan McCarragher for their interest and insights. Thanks to the Department of Energy Computational Science and Integrated Manufacturing Fellowship programs for their ?nancial support of this research. Many people in our lab group helped me considerably. Brad Nelson encouraged me often early in my PhD and provided valuable help on visual servoing techniques. Brad was also instrumental in guiding my research direction toward task-based primitives. Dave Stewart built Chimera and a number of software tools which I bene?ted from greatly throughout this research. Richard Voyles and I collaborated on research ranging from shape-from-motion calibration to M&M sampling. Rich provided subliminal messages and lots of laughs in addition to his hardware and real-time system expertise -- it wouldn’t have been the same without him. Chris Lee and I had many discussions about building an event-driven level for Chimera -- these conversations had a strong impact on the 4th level implementation that I developed. Chris Paredis helped me with Telegrip in addition to providing stiff racquetball competition. Thanks to the members of the AML group for feedback during our AML meetings on my iii

research and to Henry Schneiderman, Carol Hoover, Chris Paredis, Chris Lee, and Richard Voyles for particularly helpful comments on my defense presentation. The Robotics Institute is a great place to do graduate work. I really appreciate the terri?c environment, resources, and especially the people -- students, faculty, and staff -- who are approachable and engaged in interesting research. I don’t think you could ?nd a more open, supportive, and stimulating environment. Most of all, I want to thank my wife, Joan, who made all of this possible. Her unwavering love and support as 4 years stretched to 6 was instrumental in my completing the PhD. Joan gave me the most wonderful gift 22 months ago: Maria Catherine. Aside from being loads of fun, Maria is a constant reminder of how far we still have to go in robotics.

iv

Chapter 1: Introduction 1.1 The Problem 1.2 Prior Work in Skill Synthesis 1.3 Skill Synthesis Problems 1.4 Technical Approach 1.5 Contributions Chapter 2: Manipulation Task Primitives 2.1 Introduction 2.2 Manipulation Task Primitive Classifications 2.3 Manipulation Task Modelling 2.4 Current Robot Programming Primitives 2.5 Resources, Strategies, and Task Uncertainty 2.6 Complex Tasks, Primitives, and Skills 2.7 Summary Chapter 3: Sensor and Motor Resources 3.1 Introduction 3.2 The robot: a ‘motor’ resource 3.3 Force Sensing and Control 3.3.1 Sensor Configuration and Signal Processing 3.3.2 Control 3.3.3 Trajectory/Setpoint Specification 3.4 Visual Servoing 3.4.1 Feature Tracking 3.4.2 Image Plane Errors 3.4.3 Image Jacobian 3.4.4 Control 3.5 Summary Chapter 4: Sensorimotor Primitives 4.1 Introduction 4.2 Sensorimotor Primitive Structure 4.3 Visual Constraints 4.4 Constraining One DOF 4.4.1 aab 4.4.2 aac 4.5 Constraining Two DOF 4.5.1 abb, acc 4.5.2 abc 4.5.3 aad 4.6 Constraining Three DOF 4.6.1 bbc, bcc 4.6.2 bbb, ccc 4.6.3 abd, acd 4.7 Constraining Four DOF 4.7.1 add

11 11 14 19 21 25 29 29 32 37 40 41 43 44 45 45 46 50 50 53 55 56 57 62 63 63 65 67 67 69 71 74 74 75 75 75 77 78 79 79 80 83 83 83
7

4.7.2 bbd, ccd 4.7.3 bcd 4.8 Specifying Five DOF 4.8.1 bdd 4.8.2 cdd 4.9 Transition Primitives 4.9.1 Guarded Move 4.9.2 fstick 4.9.3 Dithering and Correlation 4.9.4 Dither Combinations 4.10 Summary Chapter 5: Robot Skills 5.1 Skills as Primitive Compositions 5.2 Chimera Agent Level 5.3 Example Skills 5.3.1 Square Peg Insertion 5.3.2 Triangular Peg Insertion 5.3.3 BNC Connector Insertion 5.3.4 D-connector insertions 5.3.5 Press-fit Connector 5.4 Summary Chapter 6: Design Agents 6.1 Introduction 6.2 Chimera/Coriolis/Telegrip (C2T) System 6.3 CAD-based Skill Composition 6.4 Design Agents 6.4.1 A Simple Example: movedx 6.4.2 Vision Primitive Design Agents 6.4.2.1 vis_aab 6.4.2.2 vis_bcc 6.4.3 Force Primitive Design Agents 6.4.3.1 Guarded move 6.4.3.2 L2 and LR Dithers 6.5 Sensor-based Simulation Results 6.6 Interactive Software Components 6.7 Summary Chapter 7: Conclusions 7.1 Summary 7.2 Development Extensions 7.3 Future Research 7.3.1 Task Uncertainty 7.3.2 Parameter Adaptation and Optimization 7.3.3 Parametric (Feature-based) CAD 7.3.4 Design for Sensor-based Assembly

85 86 86 86 87 87 88 89 89 93 94 95 95 96 102 102 107 111 113 115 118 121 121 123 128 132 133 133 135 136 136 136 137 139 142 144 145 145 147 149 149 150 150 151
8

7.3.5 Interactive Software Components Chapter 8: References Chapter 9: Appendix 9.1 Chimera Agent Level 9.1.1 Software Design Issues 9.1.1.1 High-Level Design 9.1.1.2 Messaging and Control Flow 9.1.2 Agent Object Definition 9.1.3 FSM Object Definition 9.1.4 When should I use an agent or an fsm? 9.1.5 Event Generation and Processing 9.1.6 Comparison to Onika 9.1.7 Code Distribution 9.2 Agent User’s Guide 9.2.1 Getting Started 9.2.2 Agentcmdi User Interface 9.2.3 Skill Programming Interface (SPI) 9.2.4 Useful Agent Function Listing 9.2.5 Limitations 9.3 Control Agent Example 9.3.1 Configuration File 9.3.2 Source Code 9.4 C2T: CAD Integration 9.4.1 Communications 9.4.1.1 Telegrip/Chimera 9.4.1.2 Telegrip/Coriolis 9.4.1.3 Coriolis/Chimera 9.5 Skill Listings 9.5.1 Square Peg Insertion 9.5.2 Press Fit Connector

151 153 161 161 163 163 164 165 166 167 167 168 169 169 169 170 170 174 176 176 176 177 181 182 182 182 182 183 183 185

9

Chapter 1
Introduction

1.1 The Problem
Adaptable, intelligent robot programs require sensor integration. Lack of sensing leads to very brittle robot control programs which cannot adapt to uncertain knowledge about the task during programming or imperfect control during execution. But sensor integration increases programming dif?culty leading to a trade-off between programming dif?culty and program adaptability. This undermines robotics’ ‘niche’ as ?exible automation for mediumvolume manufacturing between people and hard automation. To realize rapidly deployable systems requires simultaneously integrating sensors while reducing the programming burden. This thesis addresses the problem of ef?cient sensor-based skill programming for robotic assembly tasks. The dif?culty of robot programming has been known for some time. Automatic planners were developed so that tasks could be described at very high level and then a robot program automatically generated. The problem is that we ask too much of planners -- that they give

CHAPTER 1: INTRODUCTION us a plan in terms of robot motions which will execute in spite of world-modelling errors used at plan-time and the uncertainty inherent in the real-world. The sequence of steps can be recovered at plan time, but the details of robot motion often cannot for tasks involving uncertainty. These planners need more powerful, reactive commands -- skills. A skill represents an innate capability to perform a speci?c (and usually recurring) task. Examples from everyday life include walking, opening doors, climbing stairs, and driving. These skills enable people to execute high-level plans and limit the necessary level of communication detail. A skill is speci?c to a task but possesses the ability to deal with uncertainty and variations in that task. For example, a door opening skill might effectively deal with different handles, sizes and weights of the door, and spring/damper mechanisms attached to the door. Embedded in these skills is appropriate sensor use to resolve limited task uncertainty. In contrast to people, robots are unskilled -- unable to do any task without extensive and detailed directions from a higher-level system. Developing robot skills for recurring, complex tasks would signi?cantly impact the robot programming problem. Integrating such skills with high-level plans is an instantiation of the general principle of combining deliberative planning and reaction which is widely accepted in robotics today [35]. In the robotics literature skills mostly refer to sensorimotor mappings which are often learned. In this thesis, a skill implements a particular algorithm (usually multi-step) to solve a speci?c manipulation task. There is a many-to-one mapping between skills and a task -- that is, many different strategies or algorithms can be employed to solve a particular manipulation task. Simon et al [61] investigated this question for snap-?t tasks with a single-step strategy. One of the key conclusions they drew was that the optimal strategy (the ‘best’ parameter set of the move command) was a function of not just the task, but of the performance index chosen. The focus of this research is not on building optimal skills for speci?c tasks, but on developing and implementing an approach to ef?ciently building satis?cing skills for several different, yet related, tasks by transferring portions of a skill for one task to a skill for a similar task. Assembly tasks are targeted since contact tasks tend to be the most dif?cult robot tasks to program. 12

1.1 THE PROBLEM Fitts [63] identi?ed three phases of human skill acquisition: cognitive, associative, and autonomous. During the cognitive phase the basic strategy is recovered. During the associative phase the skill performance is improved through practice. The skill is replayed during the autonomous phase usually with little or no cognitive effort. In reality these phases are not distinct but overlap. Given the robot skill de?nition as a particular algorithm implementation for a speci?c task, robot skill synthesis is a two-part process: 1) strategy development in terms of the task representation, and 2) strategy translation for implementation on a particular robot/sensor system. Robot skills are dif?cult to synthesize for a number of reasons. Complex tasks are dif?cult to capture in a model. Interpreting the sensor signals to extract task information useful for controlling and monitoring the task is very dif?cult and timeconsuming. Yet sensor use is necessary for dealing with task uncertainty and recovering from execution errors. The gulf between the task space and robot/sensor spaces makes strategy translation dif?cult. That is, strategies which are developed in terms of the task, must be translated and executed in terms of the robot/sensor system. Strategies must be implemented in terms of real-time software; the signi?cant detail required makes strategy implementation tedious and error-prone. Finally, task similarities are rarely directly exploited in the construction of new skills. Hardware and software composition for rapidly-deployable systems has been pursued over the last decade in the Advanced Mechatronics Lab at Carnegie Mellon University and we are leveraging portions of this past work to develop a system for ef?ciently building sensor-based skills. Paredis [55] developed the remote modular manipulator system (RMMS) and accompanying task-based design software to create custom, fault-tolerant manipulators for a speci?c path-tracking task. The recon?gurable software framework of Stewart [66] provides a structure within which to write reusable, real-time software modules based on a port-based object concept. The resulting framework allows the control system designer to focus on algorithm development, not real-time implementation details. Gertz [21] developed Onika, a graphical interface to the Chimera recon?gurable software, which allowed novice users to quickly compose pick-and-place robot task programs. Carriker [13] developed a system to map high-level assembly plans generated with Mattikalli’s work in mechanicsbased motion planning [38] into real-time software modules for execution. Recent efforts 13

CHAPTER 1: INTRODUCTION focus on extending the robot programming paradigm from textual, low-level programming to demonstration [28] and composition. The idea is to provide a more intuitive programming interface for the user and to map the demonstration onto sensor-driven primitives or skills for robust execution. Rather than recordings of the robot’s kinematic trajectories, the resulting programs are represented in terms of the task and how it projects onto the robot’s sensors. This provides robustness to inevitable world modelling errors during program composition and to imperfect control during execution. As part of this effort, Voyles [75] is developing a novel tactile sensor/actuator based on magneto-rheological ?uids to provide tactile feedback both during programming for the human and during execution for the robot. This thesis addresses robotic skill acquisition for assembly tasks through building force and vision based sensorimotor primitives and integrating them into a CAD environment to support concurrent task design and task-level skill composition.

1.2 Prior Work in Skill Synthesis
Since fundamentally skill synthesis involves constructing robot programs to perform tasks, previous work in robot programming methods is relevant. More recently, skills have been speci?cally mentioned in the robotics literature ([1],[24],[28],[32],[53],[64],[71],[79]). Previous work in robot programming falls into four categories: 1) explicit programming, 2) automatic planning, 3) demonstration, and 4) learning. Each of these areas is discussed next in the context of skill synthesis. Explicit programming of robots requires the translation of a task strategy onto the robot/ sensor system. For pure positioning tasks, robot programming is relatively straight-forward (though tedious), and requires well-calibrated robots and workcells. For contact tasks (e.g. assembly), the program is much harder to create. The chief dif?culty is predicting how different task conditions appear in the sensor space. Whitney [78] and Strip [68] both analyzed insertion tasks in terms of their contact states for predicting the sensor space mappings of different task conditions. This analysis was then used to derive a sensor-based strategy for accomplishing the task. Schimmels and Peshkin [57] have synthesized admittance matrices 14

1.2 PRIOR WORK IN SKILL SYNTHESIS based on task contact models as well. Assuming the task parameters are known and the initial conditions are satis?ed, these algorithmic methods are guaranteed to succeed. The dif?culty with contact state analysis is that it is dif?cult to perform on complex tasks. Even relatively simple tasks can have a large number of contact states. Other researchers have used heuristic methods for deriving task strategies for explicit programming. Paetsch and von Wichert [52] synthesized peg insertion strategies using a dextrous hand based on heuristic behaviors observed in human insertions. Michelman and Allen [41] constructed a strategy to remove a child-proof screw cap using a dextrous hand. These methods do not provide a guarantee of success, but are experimentally veri?ed. Both of these efforts ([52][41]) involved the use of hand control primitives to simplify programming tasks with a multi-?ngered hand. What are the weaknesses of explicit robot programming? The level of detail which must be speci?ed for the robot to perform a task is signi?cant. One rarely realizes just how much until one programs a robot to perform a simple task. Add in the error recovery and sensor use, and the programming burden is onerous. The real-time nature of the software implementation requires additional knowledge in the creation of real-time systems. The current languages available for robot programming (e.g. VAL, AML) encapsulate useful robot motion primitives, but they are robot-centered which makes strategy translation a tedious, mistake-prone process. Robot programmers need to be both robotics and task experts; highly-skilled technicians and engineers are required for programming robots, whereas the factory ?oor worker is the preferred programmer. The need to further ease robot programming spawned the ?eld of automatic planning. The basic idea behind automatic planning [33][29] is to represent the task in a state space along with operators which transform states. The task is de?ned as a transition from a given initial state to a ?nal state (or sets of states). The planning problem is to ?nd the sequence of operator instantiations which will achieve that task through (intelligent) search of the state space. Lozano-Perez et al [34] applied this approach as a general solution to planning ?ne-motions. When the prior knowledge is accurate, this method provides good solutions. However, large state spaces and deep searches create combinatorial search explo15

CHAPTER 1: INTRODUCTION sion making such planning methods infeasible for many real tasks. In addition, since the a priori knowledge is approximate, the resulting plans may not properly execute due to inevitable errors between the real world and the robot’s state-space representation of it. Planning is effective at the symbolic level to capture high-level strategies. However, in the author’s opinion, pushing automatic planners to handle very low-level detail (e.g. contact states) is ill-advised since the complexity increases sharply and the reliability of resulting plans plummets. The strategies required for ?ne-motion tasks involving contact are strongly in?uenced by task design and should be addressed through speci?c task features which facilitate robust mating behavior (e.g. chamfers). The third method of robot programming, human demonstration, has been around the longest but is currently undergoing some fundamental advances. Teaching the robot a path by operating a teach pendant (or by physically guiding the robot) is the most direct method of programming robots. This method is often performed by factory ?oor personnel after some initial training and does not require advanced knowledge of robotics (e.g. kinematics, control, etc.) This method essentially “compiles” the strategy into the robot space very early. However, the resulting programs are very brittle because small workcell perturbations or robot calibration errors will make them fail. Robots executing such programs usually do not use sensors to resolve uncertainty and adapt to changes in the environment. In addition, direct teaching is not appropriate for many tasks involving signi?cant contact since it is usually dif?cult for the operator to teleoperate the robot to perform a contact task due to poor tactile and force feedback devices to the human. More recently, “learning from observation” methods have been developed whereby an operator performs the task and the robot observes this demonstration and generates its own program. Kang [28] developed a learning from observation system which uses computer vision to observe a task demonstration, segment the strategy, and then map it onto a multi?ngered hand/arm system. Learning from observation is a much more intuitive method of programming robots since it allows the human to demonstrate the task through natural performance (i.e. not through the robot). However, signi?cant research issues remain for this method. One of the most signi?cant is the mismatch between the human’s effectors and sen16

1.2 PRIOR WORK IN SKILL SYNTHESIS sors the robot’s effectors and sensors [2]. The robot system must be able to sense appropriate task features to guide the strategy. For example, if the task is a close-tolerance insertion and the robot has no force sensor, then it is highly unlikely that a successful strategy will be recovered based on vision sensing alone since the human is relying heavily on tactile and force feedback during the ?nal stage of insertion. So far, the efforts in learning from observation have focused on positioning tasks. Further research is needed to extend these methods to contact tasks which require tactile and/or force feedback. Learning from observation is an example of a larger class of methods called supervised learning. In supervised learning, a teacher (programmer) provides correct demonstrations of the task for the “student” (robot system). Yang et al [79] have applied hidden markov models to skill modelling and learning from telerobotics. The skills learned are manipulator positioning tasks without force or vision feedback. Some researchers are applying supervised learning approaches to recover strategies for contact tasks like deburring. Liu and Asada [32] use a neural network to recover an associative mapping representing the human skill in performing a deburring task. The task is performed using a direct-drive robot with low friction and a force sensor integrated with the workpiece. This allows the human to perform the task will little interference from the robot while much relevant information is measured by the robot sensors. Shimokura and Liu [60] extend this approach with burr measurement information. ALVINN [56] is a car-steering skill acquired by observing a human drive a car and training a neural network on the input road images and the human’s steering commands. The advantage of supervised learning methods is their ef?ciency -- the teacher provides “directed” learning. The primary dif?culty with applying supervised learning methods to learn contact tasks is the dif?culty in collecting data and/or performing the task through the robot. Because of these problems, most work in applying learning to contact tasks has focused on reinforcement learning. Rather than a teacher, reinforcement learning has only a critic. The difference between a teacher and a critic is that the teacher provides feedback on how to modify the action to improve performance whereas a critic provides only an evaluation of performance. The student is left to discover, on his own, the proper actions to improve performance. In general, 17

CHAPTER 1: INTRODUCTION supervised learning will result in faster learning because the system is being “taught” how to improve. So why use reinforcement learning? Because a teacher is not always available. This is frequently the case in contact tasks given the dif?culty of humans in evaluating sensor signals in the context of the task. Many researchers have applied reinforcement learning methods to the recovery of a peg insertion skill. Simons et al [62] learn how to interpret a force feedback vector to generate corrective actions for a peg insertion which has an aligned insertion axis. The output motion commands are restricted to the plane normal to the insertion axis. Changes in force are used to reinforce (penalize) the situation/action pair. Gullapalli et al [23] learned close tolerance peg insertion using a neural network and a critic function consisting of the task error plus a penalty term for excessive force. The input to the network was the position vector and force vector and the output was a new position vector. About 400 training trials are required to recover a good strategy. However, the learned “skill” is speci?c to the peg geometry which it is trained on and is speci?c to the location of the peg in the workspace because the absolute peg position is produced as output. If the peg location were moved in the workspace, this skill would fail because it would be very dif?cult to accurately (relative to the insertion clearance) specify the relative transformation between the new location and training location. A few more training trials would probably suf?ce for learning the new skill, but one does not want to learn and store a different skill for every location in the robot’s workspace. Vaaler and Seering [72] have applied reinforcement learning to recover production rules (condition-action pairs) for performing a peg insertion task. The critic function is a measure of the forces produced from the last move increment; higher forces are penalized. The termination conditions are an absolute Z position (Z is the insertion axis) and a Z force large enough not to be caused by 1 or 2 point contact (common during insertion). Ahn et al [2] learn to associate pre-de?ned corrective actions with particular sensor readings during iterative training and store these mappings in a binary database. Again, the critic function penalizes moves which increase the measured force. Kinematic analysis of the task can be used to “seed” the database with a priori knowledge, but this is not necessary for the method to succeed. Lee and Kim [31] propose a learning expert system for the recovery of ?ne motion 18

1.3 SKILL SYNTHESIS PROBLEMS skills. Skills are represented as sets of production rules and expert a priori knowledge is used. The critic function is the distance between the current state and the goal state, but does not explicitly include an excessive force penalty. The method is tested on a simulated 2D peg insertion task. This section reviewed the major categories of robot programming relevant to skill synthesis and cited some examples in the literature. Most of the work which speci?cally pursues skill synthesis relies on reinforcement learning to recover strategy mappings which are dif?cult to directly recover through task modelling. The next section outlines speci?c skill synthesis problems.

1.3 Skill Synthesis Problems

Skill Representation. “Black-box” skill representations are seemingly attractive because they hide the internal detail of skills and support the common view of skills as encapsulated programs. In addition, black-box representations are consistent with the view of skills as task-speci?c and dif?cult to articulate. Consider, for example, neural network implementations of skills -- each network is a black-box, and portions of those networks are not re-used for similar tasks. O’Sullivan et al’s [50] work with explanation-based neural network (EBNN) learning trains various networks with “domain-theory” and then combines them to aid learning a new task. “Domain-theory” refers to the aspects of the tasks which are transferrable. However, the learned task network is independent of the domain-theory networks (their functionality is duplicated in the task network), and the domain-theory is used only to speed learning. Black-box skill representations do not support the skill transfer which is necessary to rapidly create new skills. To facilitate skill transfer between related tasks, a skill representation must be developed in which well-de?ned primitives are combined to form skills. The lack of such a skill representation has prevented the direct exploitation of task similarities when developing new skills. Gullapalli et al [23] exploit a round peg insertion skill to learn a square peg insertion skill, but the ?nal result is two unrelated black-boxes. 19

CHAPTER 1: INTRODUCTION The work involved in developing skills can be categorized into domain-speci?c (applicable to other, related tasks in the domain) and task-speci?c (applicable only to the speci?c task). A task is a very speci?c problem instance while a task domain is a set of similar or related tasks. One of the key assumptions in this research approach is that skills for similar tasks can share the same sensor-based primitives. The goal with a primitive-based skill representation is to focus on developing domain-speci?c primitives to effectively amortize the primitive development effort over a task domain. In addition, the existence of a library of task-relevant, sensor-integrated primitives will facilitate the construction of new skills in the domain. Since skills are strategies for complex tasks, the primitives can be associated with subgoals of these tasks. This divide-and-conquer approach is an effective way to develop solutions for complex problems. Primitive-based skills may be an effective way to pursue the longer-term goal of skill-morphing whereby a new skill is automatically constructed from existing skills. Exploiting task similarities for skill transfer has not been thoroughly researched in previous skill synthesis work. Sensor Integration. Sensor application is critical to improving the robustness of skills by providing the ability to express and execute the strategy in terms of the task. Tedious and dif?cult to accomplish, it should be encapsulated for re-use whenever possible. Most robot programming continues the separation of sensing and action. Ahn et al [2] connect task-speci?c error corrective actions to sensor signals through on-line learning. Erdmann [18] analyzes the information requirements of a task through the design of abstract sensors for a speci?c task. Other methods [23][32] integrate sensing and action into neural networks which are task-speci?c. The availability of structured, sensor-integrated commands is very limited: guarded moves and compliant motion are the only common sensor-integrated commands in use today. The lack of sensor-integrated commands forces the burden of sensor integration onto the application programmer for each task instance. It is up to the application programmer to determine how to use the raw sensor data to monitor and control the task. As extracting information from the sensor signals is very dif?cult, integrating sensors on a taskby-task basis is very inef?cient, especially for complex and dif?cult tasks. 20

1.4 TECHNICAL APPROACH One approach of directly applying sensors to a complex task hides the sensor use inside a “black-box” (e.g. neural network). While sensor-based strategies for dif?cult tasks can be recovered this way, it continues the per-task integration of sensors. Another method is to break the complex task into reusable subgoals which are tractable for sensor application. An intermediate sensorimotor layer based on these subgoals can integrate sensing into reusable primitives. This would extend the sensor-integrated command library available for robot programming and provide sensor-integrated primitives which could be quickly composed into task-speci?c solutions. Programming Complexity. A persistent problem with robot programming is its complexity. Every detail must be speci?ed for the program, which is tedious and error-prone for a human. In addition, the task strategy is developed and represented in terms of the task, yet it must be translated and executed in terms of the robot/sensor system. Sensor integration further exacerbates this problem by increasing the complexity of the programming problem. The existence of robot programming languages (e.g. AML, VAL) has impacted the problem through the integration of robot-centered language primitives. These languages, however, have only robot domain knowledge, not task domain knowledge. Task-relevant (not just robot-relevant) commands would signi?cantly impact the programming problem by providing more direct primitives in which to develop and execute task strategies. In addition, the judicious use of graphics in a programming environment can signi?cantly ease the programming burden. This is currently evident in the graphical block-diagram building tools of software packages like MATLAB.

1.4 Technical Approach
The thesis goal is to conceive, develop, and demonstrate a framework in which sensorbased robotic assembly skills can be ef?ciently composed by a human programmer. Rather than try to fully automate skill synthesis, the approach builds skills out of parameterized primitives which are reusable and provides graphical programming environments to facilitate the composition of these primitives into event-driven skills. The framework allows the 21

CHAPTER 1: INTRODUCTION insertion of more sophisticated algorithms when they are available, but the emphasis is on supporting the human, not replacing him. One difference between this work and that of previous researchers in skill synthesis is that the focus is on building multiple dif?cult skills, not learning one dif?cult skill. The three skill synthesis problems outlined in the previous section are addressed as follows: sensorimotor primitives are developed to integrate sensors for a broad class of tasks; skills are represented as ?nite-state machines of primitives to encourage reuse; and programming complexity is controlled through graphical programming and design agents which assist the skill designer/programmer in the CAD design/programming environment. Sensorimotor primitives integrate force and vision sensors for a class of tasks (a task domain) rather than individual task instances. The primitives may be thought of as a specialized commands for a particular task domain. This is similar to software packages like MATLAB which facilitates the construction of control system software through incorporation of domain-specialized building blocks. To create such a primitive set, one must ?rst identify what is common or similar about the set of tasks to guide the development of primitives useful for the task domain. For assembly tasks, the similarity is relative-motion constraints. A taxonomy identi?es 20 different relative motion classes which are used to guide sensorbased primitive development. Each class has multiple interpretations of the constraint in terms of geometry and mechanics. In addition, the constraints may be de?ned through contact or non-contact interpretations (or combinations of contact and non-contact). Figure 1-1 shows a graphical depiction of the idea. The primitives occupy an intermediate level between the task domain and the generic sensor and action spaces of the robot. Building primitives requires using task information to interpret sensor signals. Dealing with the raw sensor and action spaces is “high-dimensional” because many details must be speci?ed. More importantly, many of these details are not speci?c to a particular task. For example, a particular controller, trajectory, and event detector set may be applicable to many tasks. The goal is to capture the reusable development work into parameterized primitives which can be quickly re-applied to other tasks. The interface between the task space and the sensorimotor space is “low-dimensional” in that relatively few details must be supplied to select and 22

1.4 TECHNICAL APPROACH

Task Space

Low Dimensional Sensorimotor Primitive Space High Dimensional

sensor space

action space

Figure 1-1: Sensorimotor Space parameterize the primitives into a speci?c program for a particular task if the appropriate primitives exist. Since skills are typically built for complex tasks, they are usually multi-step algorithms. Finite-state machines are natural representations of primitive-based skills which require discrete changes in the subgoal and corresponding execution primitive. The graphical nature of representing and interpreting fsm’s supports GUI-based skill composition. A fsm graphical editor provides the ability to view and edit the state machines which encode skill programs. Explicit task design for robotic execution is critical. Since a skill represents in some sense the ‘hardest’ part of the overall application, it is appropriate to consider the execution strategy and the task design concurrently and to involve the human in the development. This co-design approach respects that poor task design generally cannot be overcome by more complex execution algorithms. It also respects that automatic planners at their current stage of development are unable to deal with these tasks which are very complex. The primitives represent the most cost-effective sensor-based commands (because they have been previously developed) and should be used whenever possible to maximize their return-on-investment. The tight integration of the primitives into the design environment allows the user to 23

CHAPTER 1: INTRODUCTION modify both the task design and the strategy design when developing a new task and skill pair. For exploiting the primitive capabilities at the design stage, the concept of interactive software components (ISC) extends software primitives to have both execution and design components. The software composition process then becomes a collaboration between the human programmer and the software components which make up the program. Thus ISC’s are active participants in their incorporation into the software. This is an important concept to control programming complexity as software components gain capability and complexity. To demonstrate this concept for the assembly domain, a CAD environment, the real-time system, and a rigid body simulation capable of modeling collisions and impacts have been tightly integrated. The de?nition of a primitive is extended to include a set of design agents in the CAD environment to support the primitive use. These design agents do not provide design advice, rather they represent the primitive capabilities inside the design environment. The primitive is the execution component while the design agents are the design component of the software. The integrated system allows the designer to instantiate an assembly strategy in terms of available primitives and evaluate performance via simulated execution including force and vision sensor feedback. Figure 1-2 shows a graphical representation of the framework components and their connections. Underlying the approach are sensorimotor primitives which were introduced above and are discussed in more detail in Chapter 4. Guiding the development of sensorimotor primitives are manipulation task primitives which are indicated by the MTP block in the ?gure and discussed in Chapter 2. The sensor space refers to a force sensor and vision sensor and the action space refers to cartesian motion capability; these resources are described for completeness in Chapter 3. Primitive-based skills are described in Chapter 5 for six different real-world tasks including canonical peg insertions and dif?cult connector insertions. The extension to the Chimera real-time operating system for executing these event-driven skills is also described in Chapter 5. The design integration of the sensorimotor primitives is discussed in Chapter 6 which includes the design agents, Telegrip, and SIM blocks in the 24

1.5 CONTRIBUTIONS

Telegrip CAD

Chimera RT System

Task Design

Skill Program sensorimotor primitives sensor space action space

design agents

SI M

MTP

Figure 1-2: Skill Synthesis Framework ?gure. Finally, conclusions are drawn, limitations discussed, and future work is outlined in Chapter 7.

1.5 Contributions
? A primitive taxonomy based on relative motion between two rigid parts is used to drive sensorimotor primitive development. My contribution involves both a different representation of the taxonomy and, more importantly, the connection of the taxonomy to executable sensor-based primitives. This extends the use of the taxonomy from an abstract concept to one that motivates and drives executable primitive development. The key concept is that the speci?c task features which actually de?ne the motion constraint are directly sensed for task control. ? A collection of sensorimotor primitives was built based on the task primitive taxonomy classi?cation. A novel dithering/correlation motion constraint detector was developed along with a combined force/vision primitive for implementing 3 translation DOF constraints. Existing algorithms for guarded moves, accommodation, and visual servoing were also incorporated to realize other task primitives. Sensorimotor primitive reusability is demonstrated through the con25

CHAPTER 1: INTRODUCTION struction of skills for different dif?cult tasks using these common sensorimotor primitives. ? Skills are represented as ?nite-state machines to leverage primitive development. To execute these skills, this thesis extends the Chimera recon?gurable software framework to a “4th level”. This level provides two new objects one which processes events and one which processes both events and data. It complements the periodic, data-driven nature of the 3rd level objects with asynchronous, event-driven 4th-level objects. This 4th level has been integrated with a GUI for editing state-machines and a CAD system for building the skills through a task-level programming interface. ? An important contribution of this thesis is the extension of the sensorimotor primitives from the execution system (Chimera) to the task design system (Telegrip) as design agents. Because any primitive set has limitations, it must be deliberatively exploited during the task design stage. The design agents represent the primitives inside the design environment to facilitate their incorporation into programs; design agents do not modify the task design, rather they assist the user/programmer in selecting and parameterizing the primitives to execute particular steps of the strategy. The broader concept of Interactive Software Components (ISC) considers software objects as having both execution and design components. The execution components appear as part of the ?nal program. The design components interact with the user and task model during the software composition process and range from simple data entry assistance to automated selection and parameterization algorithms. The ISC concept transforms the composition of skill software into a collaborative process between the primitives and the human programmer. ? The seamless integration of the CAD design environment with the Coriolis mechanics-simulation package and the real-time system provides some unique capabilities. Executing the same primitive object code which runs the actual robot removes the possibility of code mismatch between the design/programming (simulation) stage and the robot (real) execution stage. It also provides a 26

1.5 CONTRIBUTIONS useful environment for developing sensor-based algorithms, especially for contact tasks without fear of damaging the hardware. The Coriolis integration provides the ability to simulate contact tasks involving collisions and impacts which is crucial for modelling assembly tasks and providing simulated force feedback. Finally, by combining the ability of Telegrip to generate camera views inside the CAD world with the Silicon Graphics workstation’s ability to output video from the screen, ‘synthetic’ video is supplied to the image processing system. Visual tracking algorithms implement vision-driven primitives instead of pin-hole camera feature projections which assume that visual tracking is possible. With advances in CAD system visual rendering, this capability will be important in modelling the photometric effects which are so important to visual tracking algorithms. ? Six dif?cult, real tasks were robotically executed using primitive-based skills. The experimental results show the successful execution of these tasks with common sensorimotor primitives. These tasks include canonical peg insertions as well as more dif?cult connector insertions including one involving a press ?t requiring signi?cant force.

27

CHAPTER 1: INTRODUCTION

28

Chapter 2
Manipulation Task Primitives

2.1 Introduction
Programming complexity is controlled through the introduction of higher-level programming abstractions relevant to the problem domain. Maple and Mathematica are examples of this approach from the mathematics domain. These packages are targeted at a particular domain and are not generally helpful for a different domain. The bene?t of these packages is the encapsulation of domain knowledge in a form which is easily composed into task-speci?c solutions. The availability of domain-relevant primitives greatly shortens the time required to synthesize a speci?c task solution compared to using a general-purpose programming language (e.g. C or FORTRAN). The bene?t is largely due to the directness by which a problem solution can be expressed in the language -- there is little ‘translation’ which must be done compared to a general-purpose language. This thesis investigates the creation of such a package for a certain class of robotic tasks -- rigid body assembly. Robotics-relevant primitives and structures (e.g. move commands and homogeneous

CHAPTER 2: MANIPULATION TASK PRIMITIVES transforms) are embedded in languages like VAL and AML. However, these primitives are robot-centered and have no information about the task domain -- they are task-domain neutral. These capabilities are focused on the robot, and not on a particular task domain. Although they do help the robot programming problem through integration of robot-centered knowledge, they ignore the most dif?cult aspect -- translation of the task solution into robot primitives. Attempts to automate the translation process through task planners [29] have met with some success, but have not generally solved the problem. One reason why is the lack of powerful primitives in which to terminate the plans. The result of this is that task planners are forced to generate very ?ne-granularity plans even though their information may not be reliable enough to do so. Besides being computationally expensive (and combinatorially explosive), the resulting plans are rarely robust because some information upon which they are based is suspect. In addition, signi?cant task uncertainty can make developing such a priori plans very dif?cult, if not impossible, especially in a robot-centered form. What is needed is to instantiate a task plan which remains in a ‘task-centered’ form until run-time, at which time it is translated into speci?c robot motions. To accomplish this requires the development of task-relevant primitives for the particular task domain which are implemented in terms of speci?c robot resources. To robustly execute task-relevant functions at run-time, primitives must be task-driven. The overall goal is task control and the robot is merely a tool to effect it. Sensor integration must integrate task measurements into the robot control loop for effective task control. Very few sensor-integrated commands exist in robot programming languages, which are designed to be very general (and hence widely applicable). Currently sensors are sparingly used and integrated for speci?c tasks at the application level. Sensor integration necessarily means introduction of speci?c task models for interpreting the sensor signals as task information. The most common sensor-integrated command in current languages is a guarded move which terminates motion on a force threshold. The guarded move is used often because it encompasses a common task function (acquiring a contact) and it employs weak task assumptions which are easily satis?ed (‘stiff’ parts). The key challenge is to create ‘general’ sensor-integrated primitives which can be re30

2.1 INTRODUCTION applied to different but related complex manipulation tasks. Primitives can be evaluated with respect to their generality and their power (Figure 2-1). Higher primitive generality means it can be applied to more tasks. Typically, a very general primitive is domain-neutral (e.g. joint moves or cartesian moves). A very speci?c primitive is applicable to a single task instance (e.g. a speci?c instance of a peg-in-hole task). A primitive’s power is related to how much uncertainty and ambiguity it can successfully resolve (more uncertainty resolution = more power) and how much information it contains. These two attributes are consistent since typically the resolution of signi?cant uncertainty requires more information to be embedded in the primitive. Typical robot-centered primitives (e.g. cartesian moves) are open-loop with respect to the task and cannot resolve any task uncertainty whatsoever. There is a trade-off between generality and power -- the most general primitives are robot-centered which have little “power” for the task. Powerful primitives for very dif?cult tasks often have a ‘blackbox’ ?avor which severely limits their generality. The middle ground balances generality and power. In the author’s opinion, maximizing both generality and power is not feasible -one is sacri?ced for the other. Improving primitive power necessarily requires increasing primitive speci?city. The purpose of this chapter is to propose and develop an approach to identifying manipulation task primitives based on classifying the types of relative motion between two parts. This chapter begins with this classi?cation followed by an extension to frame-based manipulation task models. Geometrical/mechanical interpretations of the classi?cations are provided. Task primitive de?nitions are strongly in?uenced by the resource capabilities. The types of task uncertainty to be considered are outlined and the connection between skills and primitives is discussed. move cmds Generality NN ‘skills’ Power Figure 2-1: Primitive Capabilities 31

CHAPTER 2: MANIPULATION TASK PRIMITIVES

2.2 Manipulation Task Primitive Classi?cations

A common method of describing manipulation tasks involves the speci?cation of position and orientation of frames attached to the parts. This is depicted graphically in Figure 2-2 where W corresponds to the world frame, B to the robot base frame, E to the robot endeffector frame, and P1/P2 to the frames associated with each part. This thesis focuses on the class of manipulation tasks which can be described as the relative positioning of two parts, one of which is held by the robot and one of which is ?xtured in the environment. Assembly tasks can be naturally mapped into this class. Other tasks may be (fundamentally) de?ned differently; for example, grinding is de?ned by the removal of burrs along an edge. One strategy is to move a grinding wheel or tool along the edge, but this relative motion is not the goal. Instead, a process model is needed to relate the goal (burr removal) to the relative motion which the robot can implement. A manipulation task primitive (MTP) can be classi?ed by a particular relative motion between two parts. This is a more task-centered de?nition than Michelman and Allen [41] or Speeter [65], who use the term to refer to primitive multi-?ngered hand motions useful for manipulation tasks. Speeter generates a collection of coordinated hand joint motions which implement useful ?nger motions (e.g. grasping or pinching). The effect of these primitives is to collapse a very high-dimensional joint space into a lower-dimensional primitive space P1

W

B

P2

E

Figure 2-2: Frame-based Manipulation Task Model 32

2.2 MANIPULATION TASK PRIMITIVE CLASSIFICATIONS which has been carefully designed to capture useful hand joint-patterns for tasks. Our goal is similar except we are incorporating sensors and using a simpler motor resource (only 6 DOF). The goal is the same: to discover patterns in the higher-dimensional space which are useful for tasks and capture them in parameterized primitives. Other robotics researchers have realized the need for such reactive, sensor-based primitives. Smithers and Malcolm [64] combine task-achieving behaviors with a planner for a SOMA-cube world. Hopkins et al [26] suggest the development a force primitive library for assembly but do not provide a methodology for creating it. There are 64 (26) different possible de?nitions of relative motion between two parts (each of 6 DOF has two possible values: 1=arti?cial or 0=natural) and 20 of these are unique. Morris and Haynes [42] identify 17 as “reasonable” for describing assembly constraints. We use a different method of representing the relative motions than Morris and Haynes [42]. Rather than list each DOF with a 1 or 0, the translation and rotation DOF for an axis are combined into one symbol which encodes translation/rotation classi?cation (Table 1). The possible DOF (T/R) for each axis can be expressed as a letter from a 4-letter alphabet. A 3-symbol ‘word’ represents the DOF of the task frame (permutations of the same three symbols have same meaning). Table 2-1: Axis DOF Classi?cations
Symbol a b c d trans/rot 1/1 1/0 0/1 0/0 meaning both free t free (only) r free (only) both ?xed

Table 2-2 shows a classi?cation of MTP’s by the allowable relative DOF between two parts. Each classi?cation is expressed as a 3-letter word drawn from a four letter alphabet describing the 2 DOF (T/R) associated with an axis of a frame. This classi?cation is expressed solely on the frame motion -- but the constraints are determined by speci?c aspects of geometry and mechanics of the task. Each of the classi?cations can have multiple geometrical/mechanical interpretations and represent a range of task primitives related by 33

CHAPTER 2: MANIPULATION TASK PRIMITIVES

Free DOF 6

T 3 2

R 3 3 2

class aaa aab aac

5

3

2

2

aad

4

2 1 3 2 1 1

2 3 1 1 2 2 1 0 3

abc abb acc abd acd bbc bcc bbb ccc

3 2 3 0

1 2 2 0 1 0 1 1

1 0 2 1 1 0

add bbd ccd bcd cdd bdd

Table 2-2: Relative DOF Classi?cation

34

2.2 MANIPULATION TASK PRIMITIVE CLASSIFICATIONS that particular motion constraint. The motion classi?cation is insuf?cient to fully de?ne an executable primitive. Some preliminary interpretations of the geometry/mechanics for these classi?cations are provided in this chapter. These interpretations may be interpreted through both contact or non-contact task features. Mixtures of contact and non-contact constraint interpretations are also possible. Speci?c constraint completion depends on the capabilities of the available resources and will be deferred until after the resources are introduced in the next chapter. The relative DOF classes are shown in Table 2-2. The constraints are illustrated geometrically by splitting the rotational and translational DOF -- two 3D geometric shapes illustrate the constraint type. A sphere represents 3 DOF in rotation or translation. A plane indicates 2 DOF, a line 1 DOF, and a point 0 DOF. These shapes help illustrate the coverage of the primitive classes. For example, 4 DOF can only be made up by (1,3) or (2,2) pairs. The (2,2) pair is seen through the geometric representation to have two cases: parallel planes and perpendicular planes. By using the shape representations, one can see that the classes completely cover the different possible relative motions. Table 2-3 shows the MTP classi?cations organized into meta-classes according to speci?c geometric and mechanical interpretations of the task constraint types. Many of these are familiar constraints and others are dif?cult to conceive any mechanism or contact situation for. The assembly meta-class includes those primitive classes which have well-de?ned, stable contact interpretations between geometric primitives (point/plane, edge/edge, etc.). Contact interpretations of constraints naturally maps into the hybrid control of position/ velocity and force. In the direction of the constraint, force must be controlled as position freedom is constrained. Two of the assembly primitive classes (aab and abb) involve fundamentally non-contact constraint interpretations. The second meta-class is “common” mechanisms which indicate constraint sets more easily envisioned as mechanisms than simple contacts. “Common” mechanisms include the ball-and-socket joint and simple hinge/crank. The last meta-class captures the rest of the primitives’ classes. In general, these are related to relatively rare mechanisms and sometimes very contrived ones. The meta-class decomposition is ad-hoc; it is possible that a new interpretation of a primitive class would allow it to 35

CHAPTER 2: MANIPULATION TASK PRIMITIVES Table 2-3: Manipulation Primitive Meta-Classes
Meta Class

Class aaa aab aac abb

DOF 6

Geometric Interpretation(s) free edge parallel to surface (no contact)

5

1. point against plane 2. edge against edge surface parallel to surface (no contact)

Assembly

abc bbc add bdd ddd bcc

4 edge against surface 3 2 1 0 3 ball-in-socket 2 1 X-slider in slot 1. hinge/crank 2. small-pitch screw ??? 4 1. ball-in-slot 2. ring in tube (M&H) ??? 3 ??? translation mechanism translation mechanism 2 rotation mechanism surface against surface 1. peg in hole (round) 2. slider in slot 3. crank w/ free handle 1. sq. peg in hole 2. large-pitch screw ?xed T-slider in slot

“Common” Mechanisms

ccc bcd cdd aad acc

Unusual Mechanisms or Contacts

abd acd bbb bbd ccd

change meta-classes. It may also be possible to have the primitive classes straddle metaclasses due to different geometric interpretations. One advantage of the new representation of the task frame DOF is seeing common subpatterns in different classes. Three of the most dif?cult classes to assign geometric interpre-

36

2.3 MANIPULATION TASK MODELLING tations to (aad, abd, and acd) are seen to have the ad pair in common; ad represents a fully free axis with a fully ?xed axis. Quite contrived contact situations or mechanisms are required to visualize these constraints. This similarity was not evident when representing the task DOF with the conventional method (e.g. 110 110). The reason for classifying manipulation primitives is to help understand the types of primitives which the robot needs to “know how to do” to perform manipulation tasks. The classi?cation helps to guide the primitive identi?cation, de?nition, and development. From the primitive classi?cations and our everyday experience, it is clear that robots should know how to operate certain types of simple mechanisms like hinges and sliders, because these mechanisms are commonly encountered in our world. In addition, for assembly tasks, the robot should be adept at operating in different contact constraint regimes which commonly occur between two parts.

2.3 Manipulation Task Modelling
The MTP classi?cation is insuf?cient for developing sensor-based primitives because it only captures the expression of the constraint(s) but not their de?nition. The constraints are de?ned by speci?c task features including geometry and mechanics. To develop primitives, the manipulation task model must be augmented to include these features. So far, the manipulation task models have been frames and they must be augmented with information about the task geometry and mechanics. The geometry speci?cation includes coordinate systems (frames) and shape descriptions in those frames. The mechanics speci?cation includes physical laws (e.g. Newtonian mechanics and Coulomb friction) and parameters (masses, stiffnesses, friction coef?cients, etc.). The manipulation task action is speci?ed as start and goal regions of the moving part relative to the ?xed part. For part mating tasks, the geometric shape uncertainty is small. If it were not, then the task de?nition would be ill-conditioned (if a peg is too big for the hole, then it makes little sense to de?ne their mating operation because it cannot succeed). The task frame is de?ned 37

CHAPTER 2: MANIPULATION TASK PRIMITIVES

W

P1

P2

Figure 2-3: Manipulation Task Modelling by speci?c (nominal) geometric task features. There are three cases of task frame de?nition: 1) de?ned completely by the moving part, 2) de?ned completely by the ?xed part, or 3) de?ned by features on both parts. The task frame origin is always ?xed relative to the moving part and thus moves with it. The task frame orientation, however, may be partially or completely dependent on the ?xed part. If the task frame can be de?ned completely by the moving part geometry, then a hand-?xed control frame orientation can naturally be used. If the task frame is completely de?ned by ?xed part geometry, then a world-?xed control frame should be used. However, sometimes the task frame is determined by features on both parts (e.g. edge/edge contact the constraint direction is determined by the edge cross-products). In this case the task control frame should be updated dynamically according to part motion. The geometric pose uncertainty can be signi?cant, especially relative to the mating requirements. This uncertainty can be reduced for the ?xtured part by on-line sensing to locate it in the workspace (e.g. using an overhead camera) or by careful initial positioning. For the grasped (or controlled part), the grasp operation must be carefully executed to minimize the introduction of errors in the grip transform. The grasp operation also requires knowledge of the world-location which can bene?t from either sensor-based location or careful initial positioning. We can also employ real-time sensor feedback during the grasp to reduce the grasp error. Since we will often use a hand-?xed task control frame, grip transform errors will introduce task control frame errors. Often the gripper will rely on friction to 38

2.3 MANIPULATION TASK MODELLING complete force closure (e.g. two-?ngered grippers with ?at gripping surfaces). Task forces can cause slipping of the part in the gripper and lead to additional uncertainty in the grip transform. Understanding the sensitivity of task control to grip transform errors is important. Ideally, primitives should be robust to grip transform errors since these errors cannot be easily removed. The mechanics component of the task model is even more uncertain than the geometric component. First, the physical laws are only approximate representations of the real natural laws. While Newton’s law is fairly accurate, the Coulomb friction law that we use is a very simple representation of a very complex phenomenon. The parameters (e.g. mass and friction coef?cients) used in these laws are also very uncertain. Any strategy which requires precise knowledge of these parameters is likely to fail in practice. It is important that primitives use mechanics information qualitatively and not rely on accurate prediction or measurement of task forces. The task goal is inherently de?ned by relative positioning between the two parts. Pose uncertainty makes executing this positioning dif?cult because even if the absolute control ?delity is very good, the knowledge about the (absolute) controller setpoints is often poor relative to the task requirements. One approach is to recover the absolute part positions in 3D space through sensor measurement (e.g. laser range?nder or stereo vision). One problem is that the sensor and robot must be very well-calibrated -- their world views must coincide very closely. The second problem is that controlling the relative part position through long kinematic chains also requires excellent calibration so that absolute positions are very accurately known. In this thesis, the task execution is based on relative rather than global task measurements by using end-point sensing which reports task relative position information. This approach is insensitive to calibration errors of both the robot manipulator as well as the sensor/robot combination.

39

CHAPTER 2: MANIPULATION TASK PRIMITIVES

2.4 Current Robot Programming Primitives

Current robot programming primitives are nearly exclusively motor commands which are devoid of task-relevant sensing. Although the robot uses feedback loops around the joints, these feedback loops are robot-centered and not task-relevant. As referenced before the lowest level commands to the robot are joint trajectories. Tasks speci?ed as such tend to be very brittle since all reference to the task is lost prior to execution. Common examples include spot welding, spray painting, and pick-and-place assembly. To be successful, this approach requires very tight control over the task environment. The next step is to at least share the motion space -- cartesian space provides a convenient shared ontology for describing both robot motions as well as task motions. Cartesian motions can be speci?ed relative to a hand frame or a world-?xed frame. A hand framebased motion is generally ‘differential’ from the initial hand location. Orientation errors in the motion vector will generate increasing positional errors with distance travelled. Such an approach offers the capability of using local reference and calibration to resolve some uncertainty. However, the translation process is still quite early, and the task execution speci?cation in terms of frames is quite abstract which undermines handling greater degrees of uncertainty. What is missing from the current robot primitives is task-relevance. To achieve task relevance generally requires sensing appropriate task features and constructing feedback algorithms which operate directly on this information. With the exception of the common guarded move, robot programming primitives are devoid of sensing and therefore of any uncertainty resolution capability. Rather than construct planners which attempt to resolve uncertainty on a per task basis, we are focusing on providing more task-relevant primitives which can resolve uncertainty on their own. Planners could avoid combinatorial explosion by terminating in such primitives. The ?rst requirement of such primitives is to be reactive and closed-loop around task measurements. More sophisticated primitives might include a planning capability which would enlarge their sphere of applicability. 40

2.5 RESOURCES, STRATEGIES, AND TASK UNCERTAINTY

2.5 Resources, Strategies, and Task Uncertainty
Teach and Replay. Considering the resources and the amount of task uncertainty leads to different classes of robotic task strategies. For very small task uncertainty, the approach is to translate the task strategy into robot joint trajectories for replay. The most successful applications of robotics -- spray-painting, welding, and pick-and-place -- take this approach. It requires very precise calibration and accurate robot control and is very sensitive to errors in either. In addition, it can tolerate very little task uncertainty -- only very small perturbations about the ‘taught’ trajectory are likely to still accomplish the task. Relatively unskilled operators can teach the positions or paths. With this approach, the robot is used as a piece of hard automation replaying the same path over and over again. It exhibits no intelligence or adaptability to task circumstances and so the resulting programs are very brittle. Humans and robots must be carefully separated for safety reasons. Measure and Move. To deal with more task uncertainty the above ‘teach and replay’ approach must be abandoned and sensors must be introduced. Since the robot is designed to be an accurate positioning device, the most direct method is to use sensors (e.g. cameras) to recover absolute setpoints for the robot. In addition, Cartesian control is used to allow straight-line motions to be executed which is an important class of motions for grasping and approaching for part mating. The problem with using absolute setpoints derived from sensors is that precise calibration is required between the sensors and the robot. Errors in this calibration severely degrade the ability to exploit the robot’s accurate control. It can also be dif?cult to measure the part absolute locations to the necessary accuracy to exploit the robot’s accurate control. Limitations in control ?delity and/or setpoint accuracy while near contact makes compliant motion necessary to regulate contact forces. Measure while Moving. To reduce reliance on accurate calibration, task information measured while moving is incorporated into the robot control loop. Doing so makes the robot-motion task-driven during motion instead of discretely determining a (task) setpoint and then executing it in open loop (with respect to the task). Faster computation has opened this alternative approach. The advantage is that the control loop is closed around the task 41

CHAPTER 2: MANIPULATION TASK PRIMITIVES measurement at high bandwidth so that sensor/robot calibration errors are rejected. One may even be able to use less precise robots (which would be cheaper) because extremely accurate absolute positioning is also not required. With endpoint sensing more endpoint compliance can be introduced to improve contact stability margins. One reason compliance is not introduced is because of the reliance on rigid chain kinematics to infer the position of the part -if the part position important to the task can be measured, then some compliance can be introduced while preserving ?ne controllability of the endpoint part. What emerges in this discussion is the tight coupling between the resource capabilities, strategies, and task requirements (Figure 2-4). The ultimate goal is bridging the task and robot spaces with sensor-integrated commands. This chapter has focused on understanding manipulation task primitives based on relative motion constraints between two parts. But the motion classi?cations must be augmented with interpretations of the constraint enforcement by speci?c task features. The resource capabilities (e.g. force and vision sensors) strongly in?uence the types of strategies that can be employed which in turn in?uence the types of Task Space Robot Space

Motion Classi?cation “partial” description

Resource Capabilities

Constraint Interpretation

Strategies

Task De?nition (Complete)

Manipulation Task Primitive Figure 2-4: Manipulation Task Primitive

42

2.6 COMPLEX TASKS, PRIMITIVES, AND SKILLS tasks which can be executed. The possible strategies supported by the available resources should in?uence the task primitive de?nitions which are driven by the relative motion classi?cation. The strategies encompass not only the sensor capabilities but the programmer’s understanding of how to apply them to the task domain.

2.6 Complex Tasks, Primitives, and Skills
From an application programmer’s point of view, skills and primitives are equivalent: robust, parameterized solutions to recurring tasks. From a ‘domain-programming’ point of view, skills are composed of primitives. Primitives are developed by hand and are expensive; the idea is to amortize their development cost by reusing them for many different tasks. A skill is typically a solution to a more complex task which does not have a primitive solution. Such tasks may involve more uncertainty than a single primitive can resolve and the strategy will generally require multiple primitives to be used in a state-machine control architecture. Non-linear behavior is accomplished through ‘piece-wise’ combination of the primitives. Complex tasks require the composition of existing primitives and/or the creation of new primitive(s). Primitive composition is cheap because it leverages previous work; primitive development is expensive because it involves task modelling, sensor mapping analysis, and detailed code development. From the ‘domain-builders’ point of view, skills are divisible, while primitives are not. The task-programmer does not care about the divisibility, only what the task function is. For example, a guarded move is a primitive while a connector insertion is a multi-step skill with a guarded move primitive used in one or more steps. A new task instance will prompt a search to see if a current primitive (or skill) will solve it. If not, it is decomposed into existing primitives to maximize the reuse of work. This approach can be extended one step further by designing complex tasks explicitly as compositions of available primitives. Without a broad set of primitives, however, the range of possible tasks which can be designed will suffer. MTP classi?cation is one tool to try to ensure a primitive set with broad coverage of basic manipulation motions. The same task with different parameters might be considered both primitive and com43

CHAPTER 2: MANIPULATION TASK PRIMITIVES plex. For example, primitives exist for solving close tolerance insertions [68] but typically these primitives have fairly small applicability regions (i.e. deal with relatively small uncertainty regions). So the ‘same’ insertion task with a larger initial uncertainty region would be considered complex if no primitive existed to solve it. One decomposition of it could include a more limited insertion primitive. The earlier steps might focus on ensuring that the preconditions of the ?nal (limited) insertion primitive were met.

2.7 Summary
In this chapter, a classi?cation for manipulation tasks in terms of relative motion between two parts was proposed. Complete task primitive de?nition must be driven not only by this classi?cation (to encourage generality), but also by the resource capabilities which de?ne the types of strategies which may be employed. The ultimate goal is to create more task-relevant robot programming primitives which integrate sensors to resolve uncertainty. The cost of doing so is to restrict the primitive set to a particular class of tasks (e.g. assembly). This represents a customization of the robot programming language to a particular task domain. Such an approach has been very fruitful in other domains (Mathematica & Maple for mathematics, and Matlab/Simulink for control systems). Understanding the task domain through primitive development can illustrate the need for speci?c types of resources (e.g. sensors or algorithms) to solve recurring, primitive tasks. Although primitive development is still expensive (and human-intensive), the reuse of such primitives allows amortization of their development costs over many task instances. Finally, by providing more capable primitives, planning complexity can be controlled. In AI it has been realized for some time that the path to more capable systems lies not necessarily with more powerful search engines, but with more knowledge. This research follows this approach by incorporating more task knowledge into the programming primitives.To build skills which are expressed in terms of the task requires sensor integration. The next chapter discusses speci?c resource capabilities using force and vision including the basic control architectures and their characteristics and limitations. Based on these capabilities, a number of speci?c primitives and algorithms for solving them with those resources are introduced in the following chapter. 44

Chapter 3
Sensor and Motor Resources

3.1 Introduction
In the previous chapter, a classi?cation for manipulation tasks was introduced to guide sensorimotor primitive development. Fully de?ning the manipulation task primitives required requires considering the speci?c resource capabilities. Those resources are introduced in this chapter. The ‘motor’ resource is a 6 DOF robot with a pneumatic two-?ngered gripper. Command and control of Cartesian motions is supported along with joint-level control. The sensor resources include a 6 DOF wrist force sensor and a CCD camera with 480x512 pixels at 30Hz framerate. Force and vision are complementary sensors for manipulation since force provides information in contact and vision provides non-contact sensing with a wider ?eld-of-view. The purpose of this chapter is to describe how these resources are integrated. For each sensor, the disturbance sources which corrupt the task information present in the signal are outlined. Based on the resource descriptions here and the MTP classi?cations in the previous chapter, speci?c MTP/SMP’s are developed in the next chapter. Figure 3-1 shows where the resource capabilities ?t into the overall picture of primitive

CHAPTER 3: SENSOR AND MOTOR RESOURCES development.

Task Space

Robot Space Resource Capabilities

Motion Classi?cation “partial” description Constraint Interpretation

Strategies

Task De?nition (Complete)

Manipulation Task Primitive Figure 3-1: Resource Capabilities

3.2 The robot: a ‘motor’ resource
The robot is controlled by individual PID control loops about each joint and Cartesian controllers supply joint setpoints to these controllers. The stiffness of the joint controllers ensures that the joint setpoints are closely followed. The lowest level method of commanding the robot is to supply the joint trajectories. Task-relevant commands requires a Cartesian controller which provides a common space in which to describe robot motions and task manipulation motions. There are two basic choices for cartesian robot control: resolved-rate and inverse-kinematics. Resolved rate control involves specifying a velocity vector to drive the endpoint of the robot. The robot jacobian, J(q), maps the changes in joint angles to 46

3.2 THE ROBOT: A ‘MOTOR’ RESOURCE changes in the task frame position and orientation. For non-singular manipulator con?gurations, the Jacobian is inverted and used inside the control loop to map desired incremental changes in task frame position into incremental changes in joint setpoint values. These joint values are then fed to individual, stiff PID joint controllers. Resolved-rate control is especially useful for trajectories which are determined at run-time -- for example from teleoperation input or sensor-feedback. Resolved-rate control is ‘open-loop’ with respect to the cartesian space because absolute cartesian setpoints are not used (only cartesian velocities). In inverse-kinematics, a task frame de?ned in the absolute world frame is mapped into a robot joint vector (again, assuming non-singular con?guration of the robot). The absolute cartesian goal position is identi?ed and then a cartesian trajectory generator generates intermediate cartesian goals which are mapped via inverse-kinematics to robot joint values. This method is closed-loop with respect to the cartesian space since absolute cartesian setpoints are used as input. The inverse-kinematic approach makes sense when the manipulation task strategy is de?ned in terms of absolute world positions. The resolved-rate approach to cartesian control is better suited to on-line determination of robot motion (e.g. teleoperation or sensor-based control), while the inverse-kinematics approach is better suited to replaying known absolute cartesian trajectories. Since signi?cant task uncertainty requires the on-line determination of robot motion, the resolved-rate controller is the natural choice. Task-based velocity setpoints for the resolved-rate controller are generated by sensor measurements which allows the control to be task-based, rather than robot-based or world-based. x F (x)
–1

qd

PID

ROBOT

q

AV

A –1

J (q)

˙ q

1 -s

qd

PID

ROBOT

q

Figure 3-2Cartesian Controllers around Joint Controllers 47

CHAPTER 3: SENSOR AND MOTOR RESOURCES Cartesian resolved-rate control is implemented around individual joint control loops. A task frame is speci?ed on the hand -- the location is expressed ?xed to the hand frame, while the orientation may be hand-?xed or world-?xed. A 6 DOF velocity vector is speci?ed to move the task frame origin (and hence the robot hand). Errors in task frame orientation will cause accumulating errors during a move. The lack of kinematic redundancy in the robot requires the Cartesian moves to be relatively small to avoid robot singularities. Since we are focusing on ?ne-motion or gross/?ne motion transition, these small Cartesian motions (~10 cm) are adequate. The resolved-rate cartesian controller is implemented with Chimera [67] recon?gurable modules (Figure 3-3). These modules are port-based objects with well-de?ned input/output and data processing functions. The advantage of the Chimera architecture is that modular real-time software components can be quickly developed and tested. The velocity setpoint may be generated by many different modules. The movedx module is an open-loop module which generates a constant velocity vector for a speci?ed time to execute a cartesian move. Using sensor-based modules to generate the velocity vector will be discussed later.

pumaxform NOA ijac cartcntl q_ref

movedx V grav_comp t_grav puma_pidg q_mez

J-1

Robot Figure 3-3: Chimera Resolved-Rate Cartesian Controller

48

3.2 THE ROBOT: A ‘MOTOR’ RESOURCE

Module pumaxform

Function computes task frame relative to world given the task frame de?nition in terms of the hand frame ? computes PID control laws for each joint and writes torque values to robot ? writes out measured joint values computes gravity compensation torques for joints 2 and 3 of PUMA 560 computes a 6x6 inverse jacobian matrix for Puma 560 maps the task velocity vector into joint increments and adds them to the current joint vector given a desired cartesian differential move (e.g. delta x), the module generates an open-loop constant velocity vector for the appropriate time to effect the move

puma_pidg

grav_comp

ijac cartcntl

movedx

Table 3-1. Chimera Modules for Resolved-Rate Control Like other aspects of the primitive, the controller is de?ned about a speci?c task frame. The velocity-based motion command has two parts: translation and rotation. A translational error in the frame placement only generates an error due to the non-zero rotation part of the command. The translation portion of the command remains unaffected although the absolute trajectory is offset. An orientation error in the task frame, however, affects both orientation and translation commands. A pure rotation occurs about the wrong axis and a pure translation incurs linearly-increasing errors with distance travelled. The Cartesian robot move primitives are motor primitives with respect to the task. The sensorimotor primitives developed in this thesis produce a task-based velocity vector as the ‘motor’ command. Force sensing is incorporated into damping force control to provide compliant motion capability.

49

CHAPTER 3: SENSOR AND MOTOR RESOURCES

3.3 Force Sensing and Control
The robot is an impedance -- it imposes a position on the world and accepts a force in return, while the world/task is an admittance which accepts a position from the robot and imposes a force on it. The robot controller is built very stiff to reject all force disturbances and achieve its goal position. High controller stiffness is important so that joint values can be supplied and the robot will closely track them in spite of disturbances (e.g. unknown information about the mass, friction, and damping properties of the robot and well a coupling torques from other joints). The fundamental problem is that two disturbances are present: friction and the contact force, and the robot should accept one but reject the other. The friction disturbance is very dif?cult to predict and must be rejected to get accurate control. The contact force, however, should not be rejected but complied to -- unfortunately, the joint control loop does not discriminate between these two types of disturbances and rejects both. Force control is necessary when performing tasks involving contact between parts to regulate the contact forces which the joint controller views as disturbances. Force sensors can deliver contact information only when contact is attained. Since the robot is very stiff, the necessary task condition is that the parts have suf?cient stiffness to generate forces when in contact. The task information in the force signal pertains to contact, but the signal is corrupted by inertial and gravity forces as well. The mass of the gripper can be several pounds which introduces both gravity loads and signi?cant noise from mass vibration during motions. Processing the force sensor signal to extract task-relevant information requires understanding these sources of signal noise and adequately compensating or avoiding them.

3.3.1 Sensor Con?guration and Signal Processing
Initially the measured force at the sensor frame must be transformed to the task frame. The task frame de?nition is the sensor placement for force sensing. Placement of this task frame is task-dependent. The task frame origin is always ?xed relative to the hand and thus moves with the hand. As discussed earlier, the task frame is de?ned by speci?ed geometric features on one or both parts. Recall that the parts are considered either controlled (by the 50

3.3 FORCE SENSING AND CONTROL robot) or ?xtured. Thus the task frame may be de?ned ?xed in the moving part (and to the hand), ?xed to the world (and ?xtured part), or a hybrid frame which depends on the contact between the two parts. In this thesis, the task frame is always ?xed relative to the held part which allows a constant transformation between the sensor and part to be used to de?ne the task frame (and which assumes that the part does not move relative to the sensor -- i.e. no slip). Before transforming the force signal to the task frame, it is preprocessed. Note that besides the contact forces, gravity and inertial loads also in?uence the signal. Three steps preprocess the force signal: low-pass ?ltering to remove noise, biasing, and transforming from the sensor frame to the task frame. Due to gripper mass beyond the sensor and controlinduced vibrations, the signal noise is signi?cant. The force signal is ?ltered through a ?rstorder, low-pass ?lter to attenuate high-frequency noise at the expense of introducing some phase lag. A bias term is recorded at the beginning of force control for subtraction from the force signal. If orientations do not change very much during the force control, this effectively removes the gravity load from the signal. Orientation changes will introduce a gravity disturbance since the original bias term will not completely cancel it. Since there is accurate orientation information available, a gravity compensation feedforward term can be computed based on the knowledge of the center of mass (COM). For handling very light parts, the gripper inertia dominates the COM. For heavier parts a grip transform is needed. But even if an accurate COM is known, measurements on the sensor indicate a variation of ~10% of gripper weight magnitude over signi?cant orientation changes. Obviously, since sensor frame gripper ma mg tool frame part task frame contact forces Figure 3-4: Acting Forces 51

CHAPTER 3: SENSOR AND MOTOR RESOURCES the mass is not changing the measured weight should be constant. Using a more advanced calibration technique [74] can reduce this variation to ~5%, but completely cancelling the gripper weight with a feedforward term is not easy. When parts are in contact (or very near) low velocities are required for (stable) low-gain force feedback, and inertial loads due to acceleration are ignored. Subtracting the constant bias term gives an estimate of the contact force in the sensor frame. The force signal is transformed from the sensor frame to the task frame using (3-1):
T

F T = RS F S τ T = RS τ S + RS ( rTS × F S )
T T S

(3-1)

where FS and τs are the measured force and torque, TRS is the rotation matrix of the sensor frame relative to the task frame, and SrTS is the vector from the task frame to the sensor frame expressed in sensor frame coordinates. To use the measurement in control it is subtracted from a reference force (in task frame) to get a force error. Since the force error is mapped to velocities in the task frame, small errors may cause drift by commanding a non-zero velocity. Because of this, and because the error will rarely be exactly zero, a deadzone is used to ignore small errors. The computed error is passed through this deadzone before it generates a compensating velocity. Different spherical deadzones for force and torque suppress the drift effects which small errors will introduce. Unless the force vector penetrates the sphere surface, the result is zero. When the force does penetrate the sphere, the direction is preserved but only the magnitude beyond the sphere is used to generate a force error. This provides smooth operation when leaving the deadzone. Using a deadzone helps to compensate for unknown disturbances but can introduce signi?cant errors for control purposes, especially if very small contact forces are desired (on the order of the deadzone). Ultimately, larger force setpoints improve the signalto-noise ratio but some tasks are not suited to large force setpoints (e.g. those involving fragile parts and close-tolerance insertions where wedging can occur). When small force setpoints are necessary, force feedback which is “closer to the task” (e.g. ?ngertips) is probably 52

3.3 FORCE SENSING AND CONTROL

force error

-Fmez

Fref

Figure 3-5: Force Error Deadzone a better solution than wrist force sensing.

3.3.2 Control
Our basic force controller is damping force control [37] which translates force errors into velocity perturbations (Figure 3-6). The controller accepts both force and velocity setpoints in the task frame for input (Vd and/or Fd). Selection of force and velocity controlled axes is done via 6x6 diagonal selection matrices (SV and SF). Hybrid control speci?es either force or a velocity command along each degree of freedom which amounts to SV + SF = I. This explicitly sets one of the components to zero. Some tasks however, require a zero force but non-zero velocity along a particular DOF. Using a hybrid control approach requires the force error to generate the required velocity. Such tasks can bene?t from supplying both types of setpoints to the same axis -- zero force plus a model-computed feedforward velocity. Errors in this feedforward velocity will be compensated by the force loop, but the feedforward term will reduce the force disturbance. Force control stability is a well-known problem in robotics [73][17]. The crux of the
SV J-1(θ) s PID

Vd

SF K

robot

Fd

-

force sensor

Env.

Figure 3-6: Damping Force Control 53

CHAPTER 3: SENSOR AND MOTOR RESOURCES problem is causality violation -- stiff robots contacting stiff environments lead to marginally stable systems. When contacting stiff environments, low force-feedback gains improve the stability of the system at the expense of response bandwidth. This trade-off means that slower command velocities are necessary to limit the contact force transients. Lowering the stiffness at contact (passively) improves the stability margins of force control but lowers the theoretical bandwidth of force tracking. One of the main obstacles to introducing endpoint compliance is violation of the rigidity assumption used to determine the robot (or tool) endpoint position from joint angles using a kinematic chain. If the end-effector position can be independently sensed, then closed-loop control can be used to reject compliance-induced errors. Many different force control laws are possible and the appropriate selection depends on the speci?c task. Rather than select a non-parametric control representation (e.g. a neural network [4][23]) capable of non-linear mappings, we will consider linear mappings between force errors and task velocities. Non-linear task control can be realized through event-driven controller transitions. One reason to avoid the nonparametric approaches is to preserve the connection between controller parameters and task design parameters. Determining controller gains is driven by the task mechanics model and task frame placement. Other researchers [37][10] have studied in detail the synthesis of control laws for many of the manipulation task primitives with a contact interpretation: peg-in-hole, crank, slider, etc. This previous work can be leveraged to develop sensorimotor primitives for those tasks. Because any strategy has necessary preconditions (i.e. initial requirements), we will use both force and vision sensors to enlarge the ‘width’ of the entrance funnel to the primitive (i.e. make the primitive as ‘powerful’ as possible). A task frame error will result in mixed control along the task directions. That is, both force and velocity will be controlled along the same task DOF to an extent. For the case a constant task frame orientation error, task frame errors produce coupling disturbances between force controlled-directions and velocity-controlled directions. Figure 3-7 shows graphically the true task frame along with a task control frame with error in it. Controlling force along the Y and velocity along the X task control frame directions will yield coupling 54

3.3 FORCE SENSING AND CONTROL disturbances because of the task frame error.

true task frame Y X task (control) frame θ

Figure 3-7: Task Frame Error V c sin ( θ ) F Y = F c cos ( θ ) – ---------------------KF V X = V c cos ( θ ) + F c sin ( θ ) K F where KF is the force feedback gain and FC and VC are the setpoints provided in the task control frame. FY and VX correspond to the force and velocity realized in the actual task frame. These equations do not include the disturbance effect of a frictional contact.

(3-2)

3.3.3 Trajectory/Setpoint Speci?cation
The force signal is not used to generate a setpoint, but to perturb it. For example, a force setpoint may be speci?ed and the force measurement is used to achieve it. Or a speci?c velocity trajectory may be commanded and the force feedback will modify it in the face of contact which generates forces. Generally, prior knowledge or non-contact sensing like vision is used to determine the inputs to the force controller. As mentioned in the beginning of this section, force feedback only delivers task information during contact and orientation changes are limited under force control as well. To enlarge the preconditions of primitives to accommodate more initial task uncertainty, a wider ?eld-of-view, non-contact sensor is needed. The next section discusses the incorporation of visual feedback in the robot control loop to effect task-driven control.

55

CHAPTER 3: SENSOR AND MOTOR RESOURCES

3.4 Visual Servoing
Weiss [76] ?rst studied the incorporation of vision directly into the robot servo-control loop and today the technique is known as visual servoing. A recent survey can be found in [27]. Image-based visual servoing is used to incorporate vision into the feedback loop because it supports closed-loop control around task errors and because it is robust to calibration errors between the camera and manipulator. The traditional look-and-move approach to incorporating vision into robot control is open-loop: an image is sampled and analyzed and then an absolute setpoint is sent to the robot controller. During the robot move, no vision feedback is used. This approach was largely driven by the processing time of the image (on the order of minutes or longer) and requires very accurate calibration between the robot and the camera. The evolution of faster computing opened the alternative method of visual servoing. Features on the image are used to generate control inputs to the robot at the camera frame rate (30 Hz). The main advantage of this approach is the rejection of calibration errors between the camera and robot due to the closed-loop nature of vision integration. The price paid for this robustness is simpler vision-processing algorithms and limited use of the vision information. A full-resolution image (512x480) at a 30Hz framerate delivers about 7Mb/s of information. Features are selected and tracked which reference only a fraction of this information so that the tracking can be done in real-time. Noise and other factors disturb the tracking effort and redundancy is typically used to add robustness at the price of tracking speed -- there is a delicate trade-off between tracking speed and robustness. Frame-rate tracking speed is achievable now with off-the-shelf, affordable hardware, but improved robustness is needed for real environments. In order to use visual servoing, features must be tracked on the image plane and used to generate errors which are converted to a task frame velocity by inverting the image jacobian. Fixed-camera (rather than eye-in-hand) visual servoing where the camera is placed to image both parts supports measuring directly the task error de?ned by relative part positioning. This section reviews several fundamental aspects of visual servoing: feature selection and tracking, image-plane error computation, image jacobians which map task motions into fea56

3.4 VISUAL SERVOING ture motions on the image plane, and fusing the commands generated by two different camera views.

3.4.1 Feature Tracking
Features are usually subportions of the image which can be tracked in real-time. This is a critical requirement of visual servoing to achieve closed-loop control and reject calibration disturbances. In addition to being trackable in real-time, features must be task-relevant and, for primitives, recurring. Task-relevant means that they can report task information which is useful in guiding and monitoring the task action. Recurring means that the features are fairly common and are expected to be useful for many different tasks. This is important so that primitives which are based on the features can be reused. Two types of features have been used in this thesis: SSD windows and corners. SSD tracking involves selecting a rectangular window as the feature and then performing a search in the next image to ?nd its location, which minimizes the sum-of-square differences (SSD) between the template and the image [3]. This rectangular-window feature is considered a “point” feature for control purposes. The feature selected should have strong gradients in both directions to provide strong discrimination information in both directions. The SSD approach is fairly computationally expensive as it requires performing a correlation of the template with a fairly wide area on the image to locate the best match. To speed this up, the image and feature are subsampled and the computation is performed in a 3-level, coarse-to?ne pyramid search. Using this technique, four 8x8 features can be tracked at 40Hz on the Datacube Max860 system. SSD feature tracking has several problems. First, it is very sensitive to illumination changes. Second, it is really only appropriate for internal features which do not include both foreground and background pixels. And third, the templates are sensitive to rotations which generally warp their appearance in the image and depth/scaling which changes their size. If a template includes background pixels and they change signi?cantly, a feature template which includes them will differ from the reference template. This is especially apparent for some types of tasks (e.g. assembly) which have de?ning features naturally occurring on the 57

CHAPTER 3: SENSOR AND MOTOR RESOURCES occluding boundaries of parts and therefore mixing foreground/background pixels. Care must be taken to select/de?ne features with strong gradients in two nearly orthogonal directions to provide the information to locate the template in the image. If the image contains similar-looking patterns to the feature template, the tracking algorithm can be easily fooled during the search process. Resampling the feature template window at each tracking update can alleviate some of these problems (e.g. illumination changes and rotations), but is not a general solution to the problems inherent in SSD tracking. In response to these problems, we have developed a different feature tracker based on a technique introduced in [70]. The basic assumption is that foreground color or intensity of the tracked object has enough spatial and temporal stability to be recognized in successive images (i.e. it is relatively constant). This assumption precludes tracking objects which are textured or otherwise have a mix of intensities. This new tracker is called a corner-tracker because it keys on corner feature projections. Corners are good features because they recur, have strong gradients in different directions which supports x,y location in the image, are scale invariant, and can be quickly tracked. In addition, corners respond naturally to rotations about the optical axis. The corner feature is parameterized by six image-plane parameters: x,y is the location of the corner point, and [dx1,dy1] and [dx2,dy2] are vectors de?ning the two lines of the corner from the center point. The corner-tracking algorithm ?rst ?nds edge points, combines points into lines, and then combines two lines into the corner. Each line needs three or more edge ?nders -- currently ?ve are used. The edge ?nders are 1-dimensional windows used to ?nd the edge by matching the mode of the template foreground to the window at edge regions. This edge ?nding thus is dependent on both the presence of an edge and loosely on the intensity level of the region de?ned by the edge. A derivative-of-gaussian ?lter of width 5 is used to compute edges along the edge?nder window and these are ?ltered by a threshold to reject weak edges. The edge ?nding method relies on both the presence of an edge as well as agreement of the intensity value. 58

3.4 VISUAL SERVOING

foreground edge?nders edge?nders corner location Figure 3-8: Corner Tracking –x 1 - -------------e p' ( x ) = ----P 2πP N – 1?2 P = ? -----------? 6 ?
x – ----2P
2

(3-3)

(3-4)

To digitize the ?lter, an odd integer, N, is chosen as the ?lter size. The ?lter template is computed for steps i=0 to N-1 by computing p’(x), with x = i + (1-N)/2, and normalizing the ?lter coef?cients so that the sum of the ?lter magnitudes is 1. After the edges in a window are found using the above ?lter and thresholding, one must be selected which divides the foreground/background. Figure 3-9 shows a ?gure of four edge pixels (two are adjacent) along an edge?nder. Each is a candidate as the foreground/ background divider. The mode (most frequently-occurring pixel value) is computed of the region associated with each edge. An edge region is de?ned as the a connected set of pixels beginning at either the ?rst pixel in the edge?nder window or the ?rst pixel after the last edge and ending with the edge pixel itself. Starting at the beginning of the window, the mode for each edge point (region) is computed. All the pixels up to and including the edge point are included in its region. The most frequently occurring pixel value, or mode, is used regions

edge pixels Figure 3-9: Edge Region Pixels 59

CHAPTER 3: SENSOR AND MOTOR RESOURCES to represent the pixel intensity value in each region. Each pixel has a range of 0-255, or 8bits, and we reduce this resolution to 5 bits to compute the mode by right bit-shifting by 3 bits. Thus, the region mode values can range from 0 to 31. Once the mode values for each edge region are computed, the edge is selected which gives the smallest error between the region mode and the mode in the last image corresponding to the foreground/background edge region. For two edges with the same error value, the edge is selected which is closest to the end of the edge?nder. Edges with region mode errors greater than 2 are not selected. Once an edge is found, its con?dence (0-1) is computed based on the consistency between the edge direction and the line direction. For example, an edge for a horizontal line, should be vertical. A missing edge point has a con?dence value of zero. 1 - ( 1 – cos ( α – β ) ) point confidence = -2 (3-5)

where α is the edge angle and β is the line angle. The edge angle is computed as the atan(dy,dx) where dy and dx are the gradients in the y and x directions, respectively. The resulting con?dence value is maximal (1) when the difference between the corner angle and line angle is +/- 90 degrees, and minimal (0) when the angle difference is zero or 180 degrees. To track a line, multiple edge?nder windows are placed along the line -- ?ve are currently used which provides redundant information for line ?tting. Edge?nder windows have an orientation either vertical, horizontal, or diagonal (1x1 step) -- the particular orientation is selected to be nearest the line normal. Edge?nders are placed equally along the line and ?nd an edge point in each. These edge points are then ?t using weighted least squares (con?dence=weight) to compute the line m,b parameters. Two different parameterizations of the line: y=mx+b or x=my+b are used to preserve robustness in the least squares computation. Parameterizations are switched at the m=1 slope. When computing new line parameters, the parameterization is based on the bounding box (dx>dy => y=mx+b) -- the goal is to keep the slope -1<= m <=1. To provide robustness against spurious noise, a weighted update is performed of the line 60

3.4 VISUAL SERVOING parameters according to a con?dence value of the line. The line con?dence is computed based on the component point con?dences (ci) and has a value in the range of 0 to 1. ?1 ? - ∑ c i? line confidence = ? --?N i = 1 ?
N 2

(3-6)

When the equations for each line have been found using the above techniques, they are intersected to ?nd the corner location. Constant lengths along each line are used to place the edge?nders. The assumption is that the image lines de?ning the corner will always be at least as long as this ?xed length. Of course, this assumption may not always hold and motions (e.g. rotations) may render a corner untrackable by shortening the edge projections on the image plane. This restriction has not been too dif?cult in practice. Rotations about an axis parallel to the image plane will modify the projection length of edges on the image and so must be done very carefully. The projection length is insensitive of pure translations and rotations about the optical axis. More sophisticated methods of (adaptively) setting this length could be model-driven. Finally, note that corners have distinct foreground and background components. The type of corner must be identi?ed so that the edge?nder windows can be properly directed with their beginning in the foreground pixels. The corner designation as acute or obtuse cannot be changed during tracking. foreground

acute corner Figure 3-10: Acute and Obtuse Corners

obtuse corner

This corner tracker was implemented using a Datacube Maxvideo20 image-processing system. Using a full-resolution image which takes ~17ms to transfer, two corners can be tracked at ~38Hz, or faster than frame-rate using a max860 RISC processor in the Datacube 61

CHAPTER 3: SENSOR AND MOTOR RESOURCES system. Since image transfer time is such a large part of the image processing time and so few pixels (~250 per corner) are actually used by the algorithm, more ef?cient image transfer would make possible tracking more corners to support stereo tracking and/or the use of more sophisticated tracking algorithms.

3.4.2 Image Plane Errors
The point of tracking features is to compute image-plane errors and generate a robot velocity vector based on these errors. Generally two corners are tracked in the image which belong to different parts in the workspace. Several errors may be computed from this pair. First, there are the x,y translation errors between the corner origin (3-8a). An angular orientation error can be de?ned between selected lines on each corner. Finally, the difference in magnitudes of two corner angles might also be used as a error signal.

Figure 3-11: Corner Errors One of the part motions is in?uenced by robot commands (i.e. the robot grasps one part). The other part is often ?xtured, but may be in unpredictable motion (i.e. on an uncalibrated conveyer). Now that feature error measures have been de?ned based on the trackable features, the relationship between the robot motion and the feature motion must be derived.

62

3.4 VISUAL SERVOING

y z

task frame
x

image plane goal feature camera frame tracked feature

Figure 3-12: Fixed Camera Visual Servoing

3.4.3 Image Jacobian
Tracking the features of interest is an important ?rst step for visual servoing, but the ultimate goal is to control the robot (and hence the task) with the information. Understanding the image jacobian, which describes how the robot motion affects the image plane feature motions, is a critical step in deriving these control laws. The ?rst step is to understand how motion of a (task) point affects its projection on the image plane. A pin-hole camera model, is used to compute the projection of a task point onto the image plane. Differentiating this equation leads to the familiar optical-?ow equation which relates task point velocities to image plane velocities. From Nelson et al [49], the result is:

x ˙T x ˙s y ˙s f ----------sx ZC 0 0 xs – -----ZC YT xs – ----------ZC f ZT XT xs f YT ----------- + ----------- – ----------sx ZC ZC sx ZC XT ys – ----------ZC f XT ----------sy ZC y ˙T z ˙T ˙ XT ω ˙ YT ω ˙ ZT ω where the image jacobian relates a point feature velocity on the image plane to the task frame velocities for the task frame aligned with the camera frame. (3-7)

=

ys f ZT YT ys f ----------- – -----– ----------- + ----------sy ZC ZC sy ZC ZC

3.4.4 Control
In this thesis simple proportional control laws are used based on the errors between a moving (and controllable) feature and a frozen (uncontrollable) feature imaged with a single 63

CHAPTER 3: SENSOR AND MOTOR RESOURCES camera con?guration. These simple control laws are used because they are easy to implement and because at frame-rate feature tracking speeds, they exhibit suf?cient robustness to calibration errors of the imprecisely modelled camera/lens/manipulator system. The difference between the features forms the projected task error on the visual sensor space. Imagebased control laws rather than pose-based control laws are used to reduce sensitivity to camera/manipulation calibration and sensor modelling errors. Resolvability analysis developed by Nelson [46] drives simple control law decomposition. For a single camera tracking a single corner, the x,y motion on the image plane and rotation about the optical axis are the most resolvable errors (i.e. generate the largest sensor signals). To control additional DOF one should add additional cameras with good resolvability placement which normally means orthogonal to the ?rst camera. Alternatively, one could track additional features on the same object to control more DOF from a single camera view.

A

v =

A

ex 0

RC K e y

A

ω =

A

RC

0 0 Kθ eθ

(3-8)

The image plane velocity is transformed into the task frame by a task/camera transformation which only needs to be approximately correct for stable control. For servoing at roughly constant depth (camera to task), the gains can be made constant to achieve acceptable control. For a depth of ~20cm, the linear gain is K=0.002 m/s/pixel and the rotary gain is Kθ = 0.5. The effect of camera/task transformation errors is warping of the image-plane trajectories. For example if the true task frame is slightly rotated about an image axis from the assumed one (Figure 3-13), then the task velocity will have a depth component and the projected component on the image plane will be shorter. Small depth velocity components are not a problem but point out the need to have a second camera view if the calibration is especially poor and the task requires tight control in the depth direction. The actual X velocity in 64

3.5 SUMMARY the image plane will be Vcos(θ) instead of V. This causes a curved trajectory like that shown in Figure 3-13 rather than the expected straight direction. true and assumed task frames θ X camera frame Figure 3-13: Task Frame Error Effects on Visual Servoing Using two cameras is necessary when the task uncertainty cannot be resolved from one camera view or when the camera/task calibration is especially poor. If the cameras can be placed orthogonally, then the projection onto task frame is independent. However, typically the cameras are not placed exactly orthogonally, so coupling occurs. Using the approximate knowledge about the camera placements relative to the task allows one to minimize these disturbances. For example, one camera can be designated ‘primary’ and used to control x,y,q on its image plane (projected onto the task). The second camera can be used to control depth relative to the second camera view. This simple decomposition was used to implement control of 3D translation for both approach and grasping control.

Z

3.5 Summary

The basic resources were introduced in this chapter for building sensorimotor primitives: resolved-rate cartesian control as the motor resource, and force and vision sensing as the sensor resources. The control strategies using damping force control and visual servoing were described. The force sensor signal processing and feature tracking to extract information from the signals for control purposes were outlined. The next chapter explores the application of these resources to solve speci?c manipulation task primitives. 65

CHAPTER 3: SENSOR AND MOTOR RESOURCES

66

Chapter 4
Sensorimotor Primitives

4.1 Introduction
The goal of the previous chapters has been to set up the development of sensorimotor primitives for robotic assembly tasks. As discussed in the introduction, the goal is to build an intermediate layer which integrates sensors for a set of related tasks. In Chapter 2, the taxonomy of relative motion was introduced and in Chapter 3, the resources were introduced. This chapter joins the two into sensorimotor primitives as algorithms for implementing speci?c interpretations of motion constraints de?ned by the relative motion taxonomy. As shown in Figure 4-1, a sensorimotor primitive bridges the gap between the robot space and the task space. Unlike an abstract manipulation task primitive class, a sensorimotor primitive is directly executable. Unlike a robot move command, it has task context and meaning because of the interpretations attached to the sensor data.

CHAPTER 4: SENSORIMOTOR PRIMITIVES

Task Space

Robot Space Resource Capabilities

Motion Classi?cation “partial” description Constraint Interpretation

Strategies

Task De?nition (Complete) Sensorimotor Primitive Manipulation Task Primitive Figure 4-1: Sensorimotor Primitive

Algorithm

A sensorimotor primitive (SMP) is de?ned as the solution implementation of a particular MTP. The term sensorimotor indicates the fusing of sensing and action into one command with a task-relevant de?nition. This thesis focuses on primitives which involve relatively small motions at the gross-?ne motion boundary and ?ne-motion [77]. Because of this, the primitives do not consider ‘gross’ constraints like avoiding joint singularities or avoiding obstacles. Task-driven primitives are de?ned via sensor information. The goal is to control the task, and the robot is merely a tool to effect it. An SMP integrates domain knowledge (how to achieve the manipulation primitive) with facility capability into a form which preserves task-context and is directly executable. This chapter does three things: 1) introduces the structure of sensorimotor primitive implementations of MTP algorithms, 2) de?nes example MTP’s based on the motion classi?cations and resource capabilities, and 3) implements algorithms to solve the MTP’s. 68

4.2 SENSORIMOTOR PRIMITIVE STRUCTURE

4.2 Sensorimotor Primitive Structure
To execute a speci?c MTP an algorithm must be conceived and implemented on speci?c resources (robots, etc.). There are three basic elements in the execution phase: 1) the controller, 2) the trajectory, and 3) event detectors. In addition, there are initialization and ?nish phases to execution for con?guring the controller transition, sensor con?guration, and “clean-up” when the primitive terminates. The discrete phases in the primitive and in the larger program (sequencing primitives) illustrate the event-driven nature of the control architecture. Central to this approach is event-detection to drive the controller changes in different parts of the task strategy. trajectory init controller event detector Figure 4-2: SMP Phases The desired task motion trajectory (or task action) is de?ned relative to speci?c task part features -- so-called ‘de?ning’ features. Often this is transformed into a speci?c robot trajectory on the basis of prior knowledge (such as the beginning locations and shapes of each part). Precompilation of task goals into robot trajectories undermines the ability to compensate for task uncertainty. The goal here is to actually de?ne the task trajectory at run-time through direct measurement of these ‘de?ning’ features. This can be achieved through visual servoing by de?ning an error function from feature locations on the image plane. Of course, such a primitive still requires preconditions to be met: for example, these de?ning features must be visible to the sensor. Prior assumptions on shape and pose are used to generate open-loop trajectories to satisfy these preconditions. Force setpoints are usually set based on initial information -- not on the force measurements. For example, insertions require zero force/torque setpoints along the constrained directions. Manipulation tasks executed by robots are fundamentally multi-step and event-detectors are needed to drive the strategy changes. The purpose of the event detectors is to capture the 69 ?nish

CHAPTER 4: SENSORIMOTOR PRIMITIVES algorithms for detecting speci?c task events through sensor data processing based on the task model and control strategy chosen. Events can be ‘internal’ to the strategy signifying the change in strategy (e.g. transition from position to force control) or they may be exiting events indicating the strategy is ?nished (succeeded or failed). At a minimum, the goal state must be reliably detected to indicate when the primitive has successfully completed. Often the goal is considered attained when the trajectory ?nishes. However, when deriving the trajectory from task measurements, the termination condition must also be determined by task measurements. Event detectors typically involve computing a scalar value and comparing it to a threshold value to generate an event. The dual variables of the controlled axes are typically monitored for event detection. For example, in guarded moves the velocity is controlled and the force is monitored. Implicit in this is the expectation that the force will increase at the appropriate time (e.g. contact). In general event detection requires predicting the event projection onto the sensor signal. Predicting this projection is currently more of an art than a science. Complex contact state effects on force signals are notoriously dif?cult to predict algorithmically. McCarragher and Asada [39] have used qualitative interpretation of force signals along with a dynamic task process model to recognize discrete state changes in a robotic assembly task. This thesis focuses on simpler event detectors which can be re-applied to different tasks. Using all the information about the task model and the control strategy can help in designing event detectors. A normalized correlation measure between the reference and force-perturbed velocity signals detects the presence of a planar motion constraint [43]. In this section the basic structure and components de?ning a sensorimotor primitive were introduced. The rest of this chapter is organized around the different MTP classi?cations of relative motion. Geometric interpretation(s) which actually de?ne the constraints are used to derive or implement algorithms to execute the primitive motion. Both contact and non-contact interpretations of the constraints will usually be provided. Contact constraints are imposed by the task mechanics ‘naturally’ -- the control challenge is to obey these constraints through appropriate force feedback. Non-contact constraints are typically both de?ned and realized through visual-servoing control. In one case, a mixture of contacts 70

4.3 VISUAL CONSTRAINTS (force) and vision is used to de?ne the constraint.

4.3 Visual Constraints
In the case of contact, the mechanics of the task provide the constraints -- the primitive’s job is to comply and observe those constraints while keeping forces inside acceptable bounds. However, algorithms for many tasks with contact constraints have fairly small entrance funnels. Consider a peg in hole task -- once the peg is in the hole, four DOF are constrained -- but this state can only be entered from a fairly precise non-contact state (or at least a state with fewer DOF constrained). Attaining this ‘precise’ non-contact state can be speci?ed and controlled through visual feedback. Implementing MTP primitives with visual feedback requires de?ning the motion constraints on the image plane.

image plane 3D corner

Figure 4-3: Pin-Hole Camera Model of Corner-Tracking

With a single camera con?guration, tracking one corner feature in a single image can provide three non-contact constraints: x,y translation in the image plane (the corner x,y location) and the orientation of the corner in the image (rotation about the optical axis). If one speci?es both the orientation and the corner angle magnitude, then all three rotations can be ?xed. With a single corner, one cannot ?x the third translation, and, interestingly, one cannot ?x only two rotation DOF. A two camera con?guration with each tracking one corner can specify the cases of three translation DOF and two rotational DOF which were not possible 71

CHAPTER 4: SENSORIMOTOR PRIMITIVES with a single camera con?guration. The motion constraints which can be speci?ed with one and two camera visual servoing con?gurations in which one corner pair in an image generates an error is summarized in Table 4-1. The square indicates two free DOF, the line one free DOF, and the point zero free DOF. Thus, a single camera con?guration can constrain 1 or 2 translation DOF, but not all three.

Table 4-1: Visual Constraint Speci?cation # cameras DOF

1 T 1 2 x x R x x T x x

2 R T

3 R x x x x

Direct features are visible to the sensor. In the case of bringing the corners of two blocks together, both corners are directly visible (and trackable) on the image (Figure 4-4a). However, this is not always the case. Consider an initial positioning goal for a peg-in-hole task (Figure 4-4b) -- the feature of interest (the hole) may not be visible to the sensor. In this case, a secondary feature must be tracked from which the location of the feature of interest can be inferred in the image. In the example given, the distance between the outside corner and the hole must be known. Using indirect features increases the sensitivity to uncertainty in task knowledge (i.e. the distance between the de?ning (direct) feature and the observable (indirect) feature). Also, when the goal position is actually offset from a feature this increases our sensitivity to camera placement uncertainty since the projected distance will change with viewing direction. So the offset may be computed based on one viewing direction but the execution performed with a different viewing direction.

72

4.3 VISUAL CONSTRAINTS

?

a) Direct Feature

b) Indirect Feature

Figure 4-4: Direct and Indirect Features Figure 4-5 shows the camera pose errors between the “design” camera pose and the actual camera pose. The camera pose error is parameterized by (Xe,Ze, and θe). The question is what is the effect of this camera pose error on the image plane error between two points in the task? By transforming the points into the new camera frame, the new projection equations can be written in terms of the original frame parameters. f X + θe Z – Xe x' = ?– - ? ------------------------------? s? Z – θe X – Ze

(4-1)

For zero camera pose error, the result collapses to the original projection point. Also note that for zero rotational and depth (Z) errors, the difference between the image plane projections of two points is unaffected by errors in X. The rotational error cannot be too

θe X Z (Xe,Ze)

Figure 4-5: Camera Pose Errors

73

CHAPTER 4: SENSORIMOTOR PRIMITIVES large or the projection quality will suffer. Also, the rotation and X error cannot be “too independent” or the features will not appear on the image plane. The image plane error between two projected features is most affected when the change in depth is signi?cant compared with the depth value. A focal length signi?cant relative to the depth also increases the sensitivity to depth errors. In practice, visual servoing was quite robust to camera/task pose errors. The main sensitivity to camera pose errors is with the feature acquisition algorithms.The very simple one implemented in this thesis requires good camera alignment.

4.4 Constraining One DOF
4.4.1 aab
This primitive involves specifying one rotation degree of freedom. Since constraining a rotation by contact usually means at least a half-constraint on translation, it is dif?cult to conceive of a contact situation which constrains only one rotational DOF. Contact situations would more naturally constrain at least a translation and a rotation DOF. A non-contact interpretation of this class is edge parallel to surface. This constraint can be implemented using corner-based visual servoing (CBVS) by de?ning it on the image plane. Our implementation is to specify the orientation of a projected corner -- the control output is rotation about the optical axis. Either one of the lines forming the corner can be used as an orientation indicator or the corner bisector can be used. By specifying a particular angle on the image plane, a surface is effectively de?ned which the particular edge is constrained to par-

a) non-contact (image plane)

b) contact

Figure 4-6: aab interpretations 74

4.5 CONSTRAINING TWO DOF allel. All are affected by rotations about the x,y axes as well as the optical z axis. The frame in which the constraint is de?ned and controlled is the camera frame.

4.4.2 aac
More common than aab, the aac class involves constraining a single translation DOF. There are both contact and non-contact interpretations of this constraint. For contact, there are two interpretations involving ‘primitive’ contacts: 1) vertex/plane and 2) edge/edge. Either type of contact removes one translation DOF. To realize either of these contact interpretations requires the use of force feedback -- for example, damping control which accepts velocity and force setpoints in the task frame. The task frames are de?ned according to the assumed task geometry -- only the axis de?ning the direction of constraint is actually important to de?ne. The point/plane interpretation requires a task frame axis to be normal to the plane surface. The velocity in this direction is zero and the force setpoint is the desired contact force. The edge/edge task constraint axis is de?ned by the cross product of the two edges -- this varies dynamically if the parts rotate relative to each other. Using a hybrid control approach, force is controlled in the constraint direction and velocity along the surface directions -- notably, the force or velocity setpoints are effectively zeroed for velocity and force controlled directions, respectively. In addition, a non-contact interpretation can be de?ned on the image plane. Specifying the image plane x (or y) coordinate amounts to constraining the corner projection along 1 direction in the image plane. In this case, the task frame is the image plane and the constraint is both de?ned and controlled on the image plane.

4.5 Constraining Two DOF
4.5.1 abb, acc
These two classes contain the (3,1) pair of DOF. The abb class refers to removal of two rotational DOF while preserving all three translation DOF. A non-contact interpretation of this constraint is the alignment of two surfaces without contact (or the speci?cation of an edge orientation). This primitive can be very useful for aligning insertion axes before mating 75

CHAPTER 4: SENSORIMOTOR PRIMITIVES task edge optical axis (a) good controllability (b) poor controllability: (high coupling) (c) poor controllability: (poor decomposition)

Figure 4-7: Edge Controllability (assuming they corresponded to the surface normals). It can be implemented by using the orientation rotation information in our corner feature, but two cameras are needed to provide independent information. If a corner is tracked in each image, two different rotation constraints can be speci?ed -- one in each image. Each camera will produce an angular velocity about the optical axis of that camera. Unless the camera optical axes are orthogonal, there will be coupling between the two rotation commands. If their approximate orientation is known (which is required for control) then one angular velocity vector can be projected to be perpendicular to the other. As before, each angular speci?cation de?nes a surface in 3space which the corresponding edge must parallel. De?ning two such surfaces constrains the free rotation to their intersection direction. There are two cases to consider. Case 1 is controlling the same edge (or parallel edges) viewed in two images. Ideally, the optical axes should be orthogonal to each other and to the edge. If the optical axes nearly aligned, then the resulting commands are highly coupled (Figure 4-7b). Minimum tracking length constrains the edge orientation to lie inside a cone centered at the cross product of the cameras’ optical axes (Figure 4-7a). A very short task edge (in 3D) will make the cone very steep and narrow and limit the applicability and robustness of the primitive. Alternatively, if the edge lies in the plane formed by the two optical axes (Figure 4-7c), independent control from one camera is effectively lost. Both cameras cannot independently control their edge projections, so this con?guration is ill-conditioned. The challenge with this primitive is ensuring that the edge feature remains visible in each image. If the 3D corner angle is very sharp (small), then this is easier than if the corner angle is shallow (large). Case 2 involves controlling two perpendicular edges on the same part via different 76

4.5 CONSTRAINING TWO DOF images, but the capability is much more restricted. Consider two optical axes which are perpendicular and a task feature which is a 90 degree corner. Suppose two adjacent, orthogonal edges are imaged in the different cameras -- only if their cross-product direction is aligned with the cross-product of the optical axes is a rotation freedom preserved (only 2 rotations speci?ed). A rotation about Z, the optical axes perpendicular, will not affect the edge projections onto the images. Non-perpendicular optical axes will not affect this constraint. Any corner orientation which is “off the Z axis” ?xes all rotation DOF because there is no free rotation which will not modify the edge projections onto the images. The goal is to place the cameras to that each image can produce a control signal which will not disturb the other. Z a X Y b b

Figure 4-8: Corner Alignment The acc class speci?es two translations while leaving all rotations free. It is quite simple to de?ne this via the TVS primitive with setpoints for both the x and y coordinates of the corner. The part is free to translate along the optical axis of the camera. Rotations are free but are limited by preserving the visibility of the feature in the images -- i.e. one cannot rotate the corner so that either edge lines up near the optical axis.

4.5.2 abc
The abc class is a (2,2) pair and enforces one translation constraint and one rotation constraint along different axes. The contact interpretation is edge against (?at) surface. Hybrid c a

b

Figure 4-9: Edge/Surface Contact 77

CHAPTER 4: SENSORIMOTOR PRIMITIVES control is partitioned according to the contact directions. The true task frame is de?ned by features on both parts: the surface normal of the ?xed part and the edge direction of the controlled part. If a measurement of the surface normal was available, the frame could be dynamically updated based on the edge orientation and the surface normal. Since a surface normal measurement is not available, a hand-?xed frame is used for task control, where one axis is speci?ed to align with the surface normal during contact. Recall that signi?cant orientation changes while in contact are disallowed to minimize gravity-induced force control errors. Alternatively, a non-contact interpretation of the abc class can be implemented with visual servoing primitives. Using a single-camera con?guration, the rotation about the optical axis can be constrained along with either x or y translation in the image plane.

4.5.3 aad
This is the other (2,2) pair with the loss of rotation and translation DOF along the same axis, but fully free motion along both other axes. This is an example of a relative motion constraint which is very dif?cult to interpret geometrically. The dif?culty centers around constraining a rotation in one direction without introducing a translation constraint in another direction. It is very dif?cult to conceive of a device which will persistently impose this constraint set. The strange mechanism shown in Figure 4-10 will momentarily impose

Figure 4-10: aad mechanism

such a constraint. Again, this constraint set can be interpreted through visual servoing by considering a two-camera arrangement. Two cameras are required since a single camera provides translation measurements in the image plane and rotation information about the optical axis which, by de?nition, is perpendicular to the image plane. Since translation and rotation constraints are required along the same direction, the optical axes of two cameras must be perpendicular. Assume that camera A will control the translation and camera B the rotation about the 78

4.6 CONSTRAINING THREE DOF optical axis. The projection direction of camera B’s optical axis onto camera A is needed to reduce the coupling. Because the constraints apply along the same axis, the camera calibration must be known fairly accurately. Assume that the cameras are aligned so that the projection of B’s optical axis is along the x-direction of A’s image plane. Then in image B the orientation of the corner is controlled while in image A its x-position is controlled.

4.6 Constraining Three DOF
4.6.1 bbc, bcc
These are the (1,2) pairs for specifying three DOF. A surface against surface is one contact interpretation of bbc. The hybrid control speci?cation is straight-forward to comply to this constraint set (see, for example, Mason [37]). Assuming Z is the surface normal direction, force is commanded in Z, and torques about X, Y are zero. Class bbc can also be interpreted as a non-contact constraint set implemented with visual feedback. In this case, the abb primitive discussed earlier can be combined with a single translation constraint in one of the images. One bcc class contact interpretation is peg-in-slot. The true task frame is de?ned by a combination of task features on both parts: the axis of symmetry of the controlled part and the surface normal of the slot wall. The control task frame is again chosen to be hand-?xed with one axis coinciding with the symmetry axis and the other corresponding to the expected surface normal direction. Again, the hybrid control speci?cation is straight-forward. Assuming the peg axis is Z and the slot wall normal is X, command force in X and Z, and zero torque about Y. Errors between the assumed slot wall surface normal and actual c c Z c b X b b Y Figure 4-11: bcc and bbc contact interpretations 79

CHAPTER 4: SENSORIMOTOR PRIMITIVES normal result in disturbances between the force controlled direction into the slot wall and the velocity commands along the slot. Using a hand-?xed frame limits our rotation about the controlled part’s axis of symmetry (since this would introduce errors between the task ‘control’ frame and the true task frame).

4.6.2 bbb, ccc
These two constraint classes correspond to (0,3) pairs. The bbb class corresponds to elimination of all rotation DOF while preserving all translation DOF. One contact interpretation of this constraint class is a strange ‘translation’ mechanism made up of non-round sleeves and sliders which is rarely encountered in everyday life. The task frame is de?ned by

Figure 4-12: bbb and ccc contact mechanisms the mechanism topology, although because of the symmetry it is arbitrary. The hybrid control is straight-forward -- position-control along all translation DOF, and torque control over all rotations. The bbb class can also be interpreted through visual servoing. Consider Case 2 of the abb non-contact interpretation. If the special con?guration de?ning abb is avoided and the angles of two task edge projections are speci?ed on different image planes, then the orientation of the controlled part is fully constrained. This is because there remains no possible rotation which will not modify at least one edge projection onto an image plane. The ccc class corresponds to the elimination of all translation DOF while preserving all rotations. A very common contact interpretation of this constraint is a ball-in-socket joint. The task frame is naturally de?ned in the center of the ball -- its orientation is arbitrary. Hybrid control allows position/velocity control of all rotations with force control along all 80

4.6 CONSTRAINING THREE DOF

Figure 4-13: Surface Visual Servoing translations. Again, a non-contact de?nition of the ccc constraint exists using visual servoing translation setpoints with a two camera con?guration. A single camera can constrain two translation DOF of a corner feature -- using a second camera allows one more translation constraint to be speci?ed. If the cameras are not orthogonal in their view, some coupling will occur between the constraints speci?ed and control disturbances will result. A much more common interpretation exists which combines force and vision for de?ning the three translation constraints. Consider a part moving along a ?at surface. In this case, to maintain contact with the surface sets 1T DOF and the location on the surface constrains two more T DOF. So in this case force along the surface normal to ?xes one DOF and visual feedback to ?xes the other two DOF along the surface. One problem is that the visual servoing control formulation must be extended to prevent large disturbances to the force controller from the vision controller because of the required camera placement. Figure 4-13 shows the projection of the image plane intersects with the real surface. Thus, errors in the Y image direction (the vertical image plane coordinate) will map into velocities into/out of the surface which will tend to drive the part into or out of the surface. The velocity projection should be roughly along the real surface instead of the projected image plane. camera Z Y θ surface

Figure 4-14: Camera/Surface Orientation If we assume that the camera X axis is parallel to the surface (i.e. that we rotate about X to ‘see’ the surface), then only the Y and Z directions must be coordinated to realize a veloc81

CHAPTER 4: SENSORIMOTOR PRIMITIVES ity parallel to the surface (Figure 4-14).

vx vy = vz

A

Kx ex RC Ky ey K y e y tan θ (4-2)

Besides coupling Y and Z, a new gain function is needed because the assumption of constant depth which allowed the use of constant gains is now violated. The gain factor which depends on the Y position is used to modify the previously constant gains. K0 K ( y, θ ) = --------------------------------2 ?1 + sy ? ---- tan θ ? ? f

(4-3)

where y is the pixel coordinate, s is the scale factor (m/pixel), f is the focal length, K0 is the gain for the depth along the optical axis, and θ is angle between the optical axis and surface normal as shown in Figure 4-14. The zero Y coordinate corresponds to the center of the image, while the bottom of the image corresponds to +240, and the top to -240. Intuitively, the gains are higher for larger depths and lower for smaller depths. The goal here is to keep the image plane response approximately constant by scaling the velocities up/down depending on the depth estimate.
Visual Servoing Gain Factor 9 8 7 6 5 4 3 2 1 0 ?250 30 deg 45 deg 60 deg

K/Ko

?200

?150

?100

?50

top

center Figure 4-15: Visual Servoing Gain Factor

0 50 Y pixel value

100

150

200

250

bottom

82

4.7 CONSTRAINING FOUR DOF

4.6.3 abd, acd
These classes are (1,2) pairs -- like the aad class, contact interpretations of these two classes are very dif?cult to generate. The acd class has two translation constraints and one rotation constraint along the free translation axis. This maps perfectly into a single camera visual servoing con?guration where the x,y location of the point is speci?ed and the corner orientation in the image is speci?ed. The abd class has a single translation constraint along with two rotation constraints. Although the single translation DOF does not require two cameras, the general speci?cation of only 2 rotation DOF does.

4.7 Constraining Four DOF
4.7.1 add
The add class has a very common contact interpretation: round peg-in-hole. Whitney [78] showed the appropriate task frame location is at the tip of the peg leading motion (or beyond it) -- this location positions the center of compliance to provide proper rotation in response to insertion forces and torques. The hybrid control strategy is position/velocity control along the ‘a’ axis and force/torque control along the ‘d’ axes. The task frame is naturally de?ned by the peg symmetry axis and is attached to the peg. A non-contact interpretation of this constraint also exists since a two orthogonal camera arrangement can implement rotation and translation constraints along two orthogonal directions. In theory, a visual implementation exists but in practice the reliance on indirect features makes the result unrobust to task variations.First, the cameras must be orthogonal and preferably aligned with the surface of the hole. The orientation aspect requires only that the occluding edge of the peg be trackable in each image and an edge on the ?xed part which is parallel to the insertion axis. The orientation part will work even if the cameras are not parallel to the surface, but they must be near to preserve robust tracking and rotation controllability. Next, to align the x,y position of the peg, the speci?cation will include an image plane distance describing the location of the de?ning feature relative to the visible feature -- here the alignment of the cameras with the surface is important otherwise one does not know 83

CHAPTER 4: SENSORIMOTOR PRIMITIVES which direction of the image plane corresponds the de?ning feature lies relative to the visible feature. Also this distance changes with depth and with orientation of the camera relative to the task (unless the task has occluding symmetry consistent with the hole -- e.g. a hole drilled in the center of a cylinder). If the visible feature is near the de?ning feature, the errors can be kept small -- but the further the visible feature is from the de?ning feature, the more sensitive to two types of errors. First, it is more likely that the two features will not both be contained in the image and second, larger errors will be introduced by camera/task orientation errors. The root of the problems incurred when trying to servo the peg x,y position is that the de?ning feature (the hole) does not project onto the images as corners, but as ellipses. Only if the cameras are carefully aligned will the projection be an ‘invisible corner’ -- this is why a visible ‘indirect’ feature was needed. This points out the need for additional feature trackers -- e.g. for ellipses. Extending the basic image processing to create an ellipse tracker by using the 1D edge?nder windows should not be dif?cult. Indeed, more general snake-trackers have been created to track nonrigid bodies [7]. An ellipse feature would have the x,y location, the lengths of the principal axes, and the orientation of the ellipse in the image. An ellipse feature could be used to de?ne in each image both the rotation and translation constraints to bring the peg and hole together in the proper alignment for mating. The peg axis alignment is driven by the ellipse minor axis direction in the image which would normally be roughly aligned with the Y image direction. The edge point of the ellipse is the target position for the peg edge point

Figure 4-16: Visual Implementation of add 84

4.7 CONSTRAINING FOUR DOF along the major axis direction (Figure 4-17). One problem is the major/minor distinction of the ellipse axes disappears when the ellipse becomes a circle under a particular viewing con?guration. The orientation of the ellipse will then be unde?ned because no major axis is uniquely de?ned.

Figure 4-17: Ellipse/Corner Interaction

4.7.2 bbd, ccd
These constraint classes are the (0,2) pairs. In the case of bbd, a similar ‘translation’ mechanism proposed for bbb conceived an be with one fewer translation DOF as a contact interpretation. The hybrid control speci?cation and task frame de?nition are straight-forward. A non-contact interpretation can be guided by corner visual feedback. The abb or bbb approach can be used to enforce orientation constraints while the addition of a translation constraint is simply a matter of introducing one x (or y) setpoint in one image. The ccd class has no translation freedom but two rotation DOF. A ‘rotation’ mechanism can be designed to implement such a constraint set (e.g. a universal joint). A similar mechanism has two yokes: one pivots relative to ground, the second rotates orthogonal relative to the ?rst. The task frame is de?ned by the two axes which are free to rotate -- the free rotation axes always remain perpendicular. The task frame moves with the ?nal (cylindrical) link. The hybrid control is straight-forward.

Figure 4-18: ccd rotation mechanism 85

CHAPTER 4: SENSORIMOTOR PRIMITIVES The ccd class also has a non-contact implementation which can be realized through visual feedback. A two (orthogonal) camera con?guration is necessary so that 3 translation constraints can be speci?ed through x,y image plane setpoints. Then, in either camera, a rotation setpoint can be provided to constrain one rotation DOF.

4.7.3 bcd
The bcd class corresponds to the (1,1) pair where a single DOF of each type remains along different axes. One contact interpretation of this constraint is a peg-in-slot variation. c d b

Figure 4-19: bcd contact interpretation Given the task frame, the speci?cation of hybrid control gains directly follows. The non-contact version of bcd through visual servoing encounters the same problem as trying to specify only two rotation DOF using corner-features. Specifying two translation constraints is easy with either one camera or two. Specifying only two rotation DOF and leaving one free, however, requires having to view and control the same task edge (or two parallel edges) in both cameras. The free rotation corresponds to the rotation about the edge in 3-space. The extent of allowable rotation about this edge is constrained by the edge-shortening effects in the images.

4.8 Specifying Five DOF
4.8.1 bdd
The bdd primitive is one of two (0,1) pair primitives -- it allows one free translation and has constrained all 3 rotations. A very common contact interpretation is non-round peg in 86

4.9 TRANSITION PRIMITIVES hole. The task frame can be de?ned by either part and we will use the controlled part -- following Whitney [78], the task frame should be located at or beyond the tip of the peg which leads motion into the hole. Hybrid control is straight-forward to specify -- force control about all axes DOF except the insertion translation direction. Again, a simple non-contact implementation can be implemented under visual servoing. Using the bbb camera con?guration to constrain rotations, two translation setpoints can be speci?ed on the image planes (can be either on a single image or one per image).

4.8.2 cdd
Finally, the cdd primitive has all translation constrained and only one rotation is free. A common contact interpretation of this constraint class is a crank or hinge. Ideally, the task frame would be located on the rotation axis -- hybrid control speci?es zero velocities and force/torques in all force-controlled directions and a non-zero angular velocity about the hinge axis. However, often the task frame will be located in the hand in which case mixed control is preferred over hybrid control. In mixed control, both force and velocity setpoints in?uence a particular DOF control. This is effectively moving the task frame. If a task frame displacement is known, then the translational velocity can be fed-forward which is de?ned by the rotational velocity along the hinge axis. Since the estimate of the radius will be slightly incorrect, the zero force setpoint will modulate the actual velocity to comply with the constraint. This is effectively moving the task frame to lie on the hinge axis. Again, the cdd constraint has a non-contact implementation under visual servoing. Combining the abb rotation constraint con?guration with 3 translation constraints speci?ed in the two images will yield the ?ve constraints in the set.

4.9 Transition Primitives
The previous sections illustrated how to use the basic force and vision controllers to implement speci?c interpretations of various MTP classes derived in Chapter 2. Assembly also requires transitioning between different constraint classes. To do this requires using 87

CHAPTER 4: SENSORIMOTOR PRIMITIVES strategies developed above and coupling in speci?c event-detectors. There are many transition primitives which can be de?ned within the manipulation primitive taxonomy, and just a few based on assembly-type contact constraints are de?ned in this thesis.

4.9.1 Guarded Move
The common guarded move is perhaps the only common sensor-integrated command available in robot programming languages today. It provides a robust method to acquiring a contact from free motion. Expressed in terms of MTP class transitions, a guarded move is aaa -> {aac, abc, or bbc}, where the speci?c output class is not speci?cally controlled. The only consistent output result is that a->c for one DOF, which reduces the free DOF by one translation. The geometric model of the mechanically-stable aac contact can be either point/ plane or edge/edge. The SMP algorithm is very simple. A constant velocity is commanded in the direction opposite to the surface normal under cartesian control. The force in this direction is monitored against a threshold to generate the termination event. The initial force reading is noted and used as a bias to be subtracted from subsequent force readings. When the force difference exceeds a pre-set threshold, the ‘contact’ event is generated and the velocity is set to zero. A maximal move distance is speci?ed -- if the force threshold event is not generated before this distance is completed, the move terminates (velocity is set to zero) and the ‘fail’ event is generated. The primitive dos not necessarily result in contact, though it often does. (The parts may bounce off each other ending the move with no contact.) Nevertheless, the parts are effectively in contact since a free motion cannot be legitimately commanded in the direction of contact after a guarded move since contact is (at least) imminent. Task frame errors manifest themselves as errors in the velocity direction. Under this algorithm, they have little effect as long as the two parts contact. Errors in the approach direction shrink the allowable relative pose region of the two parts. The commanded velocity can even be speci?ed relative to the hand frame with little impact on the strategy. Success is guaranteed only if the initial poses of the parts along with the task frame assignment can guarantee the intersection of the two parts during the move. This is fairly weak and gener88

4.9 TRANSITION PRIMITIVES ally easy to achieve with a gross motion.

4.9.2 fstick
This is really an inverse guarded move -- maintaining a translation constraint and generating an event on its loss. Besides acquiring an a->c constraint, we may also wish to maintain it and detect its loss. The task geometry and frame assumptions are the same as the guarded move. Since we do not know the exact contact state (aac,abc,or bbc), we assume the most restrictive in terms of free motions (bbc). Thus, we limit ourselves to commanding free motions in the plane and rotations about the plane normal. The task action is maintenance of the constraint on free motion in the direction of the plane normal. A constant force setpoint is applied under damping control to maintain contact. At the beginning of the move, the cartesian position is saved as a bias position. Changes from this position are noted to indicate a loss of contact; this detection algorithm is only valid if the surface does not move. This primitive assumes a motionless surface and the expectation is that when contact is lost, it is due to the moving part ‘falling off’ the ?xed part. Tiny dP thresholds (less than a few mm) are problematic since they can be easily falsely tripped through motions while in contact. A more sophisticated algorithm for loss detection might look at changes in the force signal as well as motor torques (currents), but noisy signals make extracting information dif?cult. Task frame errors are essentially errors in the surface normal. The ?rst problem is that the contact may not stick if the applied force falls outside of the friction cone. The expectation is that the force setpoint will not cause motion along the surface (only normal to the surface) -- this will only occur if the force direction falls inside of the friction cone of contact.

4.9.3 Dithering and Correlation
Sinusoidal dithering with correlation-based event detection is one approach to acquiring and detecting a bilateral motion constraint. Lee and Asada [30] use a similar approach to adjust to the minimum stiffness location during an insertion operation. The de?ning event is the loss of freedom along the surface motion direction. The advantage is that two dithers can be combined to implement a randomization or search function in the plane useful for ?nding 89

CHAPTER 4: SENSORIMOTOR PRIMITIVES holes. The basic idea is to compute a normalized correlation of the commanded and forceperturbed (actual) velocity signals over a dither cycle period (4-4). In the absence of a constraint, the contact force is zero and these two velocities track very closely yielding a con2 π ---stant normalized correlation of - . Once the constraint is acquired, the correlation drops 8 and this can be used to detect the constraint-achievement event through comparing the correlation value to a constant threshold. The correlation value is related to the phase difference between the two signals because of the normalization. ? i2 π ? ? i2 π ?? ------- g ------- ?N ? ∑ f? ? ?i = 0 N ? ? N ?? C = ---------------------------------------------------------N N i2 π ? i2 π ? ------g ? ------∑ f? ? N ? ∑ ? N ?
i=0 i=0 2 N

(4-4)

π - cos φ C = ---8

(4-5)

A simple mechanics model illustrates what is happening. Consider the fact that the constraint really has ?nite stiffness as does the force sensor and assume that there is no gap. The motion of a point connected to a grounded spring is commanded and the command is modi?ed through the force in the spring.

Kx x

Figure 4-20: Mechanics of Constraint The equation of motion for the endpoint of the spring under force control is: x ˙ ( t ) = V cos ( ω t ) – Kx where K is the combined spring constant and force feedback gain. 90 (4-6)

4.9 TRANSITION PRIMITIVES

Vin = Vcos(ωt)

1 s+K

Xout

s

Vout

Figure 4-21: Laplace Transform of Command/Reference Velocities The Laplace transform shows the relationship between the input (command) velocity and the force-perturbed (‘reference’) velocity. Since we use a normalized correlation function, we care about the phase difference between the two signals. ω? φ ( ω ) = 90 – tan ? --?K?

(4-7)

This phase equation is intuitively consistent. For very large stiffness K, the phase difference approaches 90 degrees and yields zero correlation. For small stiffness K, the phase difference approaches 0 which yields maximum correlation. The effect of a gap is to reduce the effective stiffness. A gap can be modelled as a deadF +g K Figure 4-22: Stiffness with Gap zone in the stiffness function (Figure 4-22). The effective (linearized) stiffness depends on the amplitude of x. For small x that remains within the gap, the stiffness is zero. For very large x, the stiffness approaches the upper bound of K. If we linearize the stiffness based on equivalent spring energy of the linearized model and the non-linear spring model at maximum de?ection x=A, we can gain some intuition to the effects of ?nite gaps on the correlation value. Equation (4-8) is derived from setting the spring energies equal at the maximum amplitude A>g (where g is the gap size). 91

-g

x

CHAPTER 4: SENSORIMOTOR PRIMITIVES K(A – g) K AV = -----------------------2 A where K AV is the linearized stiffness. If we let g = nA, where n<1, we can write: K AV = K ( 1 – n )
2 2

(4-8)

(4-9)

The amplitude of motion can be written as a function of this linearized stiffness based on the Laplace transform: V0 A = ---------------------------------------2 2 ω + K(1 – n)

(4-10)

If we pick some typical values for V0, ω, and K, we can see the effects of different gaps (n) on the value A. Consider V0=0.02 m/s, ω=3 rad/s, and K=10 (reasonable since stiffness of sensor might be ~10,000 N/m, but feedback gain is 0.001). The following experimental plots show the correlation drop when a constraint is acquired and the correlation value when there is no constraint. The signi?cant reaction forces perturb the input velocity and introduce a phase shift which is detected. In this case, the dithering continued and lost the constraint. Normally dithering should stop when the constraint is achieved. Table 1: Gap Effects n 0.1 0.5 0.9 1 A 0.0048 0.0059 0.0066 0.0067 g 0.00048 0.0029 0.006 0 (1-n)2 0.81 0.25 0.01 0

92

4.9 TRANSITION PRIMITIVES

Correlation
1.4

1.2

0.8

0.6

0.4

0.2

1.4 1.2 1 0.8 0.6 0.4 0.2 0
1 0

Correlation

15

15

time

20 time (sec)

20

(sec)

25

25

Vel (m/s) Vel (m/s)
0.015

0.01

0.01 0

CMD

0.005

0

-0.005

-0.01

-0.01 -0.02

-0.015

REF
V_CMD V_REF 10

-0.02

-0.025

10

15

15

20 time (sec) time (sec)
20

25

25

Fx

(N) Fx (N)

10

10
5

0

0

-5

-10

-10
10

10

15

15

time

20 time (sec)

20

25

25

(sec)

Figure 4-23: Correlation for Constraint Detection

4.9.4 Dither Combinations
Dithering and correlation have been joined into a linear/rotary combination or a linear/ linear combination. If one dither frequency is f and is executed for N cycles, the other dither frequency should be chosen as (N+1)f/N and executed for N+1 cycles. This will ensure a Lissajous pattern is executed which explores the parameter space. Why dither like this instead of just randomizing? Because it combines the exploration search with the eventdetection through the correlation. Also, the detection uses many data points and exploits knowledge about the trajectory input and controller effects in detecting the constraint acquisition event. In addition, a randomization requires an explicit step to determine the termination event. 93

CHAPTER 4: SENSORIMOTOR PRIMITIVES

1

1

0.5

0.5

-1

-0.5

0

0

0.5

1

-1

-0.5

0

0

0.5

1

-0.5

-0.5

-1

-1

n=3 Figure 4-24: Lissajous Patterns

n=5

4.10 Summary
In this chapter, the basic structure of a sensorimotor primitive: trajectory, control, and event detection was introduced. Damping force and corner-based visual servoing control resources were applied to instantiate MTP solutions to the different motion classi?cations under both contact and non-contact interpretations of the constraints. The hybrid control strategy necessary for contact interpretations was outlined and the implementations of noncontact interpretations of the constraints using visual feedback was discussed. Finally, a few transition primitives involving the design of event-detectors to robustly detect the transition from one constraint type to another were discussed. In the following chapter some of these primitives are used to construct skills for various assembly tasks.

94

Chapter 5
Robot Skills

5.1 Skills as Primitive Compositions
To address the problem of skill representation, the skill is expressed as a ?nite-state machine (FSM) which naturally supports the implementation of complex decision trees. The non-linear nature of a typical skill control law is realized through discrete control transformations. Rather than try to capture the entire strategy in some non-linear function or mapping, a segmented strategy achieves the non-linearity. Task events cause state transitions to occur during the strategy at which time a fundamentally different goal can be pursued with the subsequent controller, trajectory, and even-detector changes. A skill makes demands of both the task and of the resources. The idea is to have the skill make modest motor demands easily met by a large class of robots -- i.e. ability to execute straight-line translation motions as well as rotations about a hand-?xed axis. The sensor requirements are also fairly general: a wrist force/torque sensor and an external CCD camera. However, the primitives have speci?c requirements regarding the task projection onto

CHAPTER 5: ROBOT SKILLS the sensors. This manifests itself as sensor placement relative to the task as well as speci?c sensor features which the task must produce. This task feature dependence is what specializes the primitive so it is important that reusable features are selected. The features may be arti?cial (i.e. ?ducials) or natural (e.g. corners). The task feature projection assumption addresses the problem of extracting task relevant information from the sensor signal. It places constraints on the task/sensor interaction according to the processing algorithms used -- i.e the features must be visible to the sensor. This visibility requirement may spawn several different preconditions. For example a vision sensor may require that the lighting is suf?cient to see the feature, that the feature be unique enough in the scene to identify and track, and that the feature lie within the ?eld-of-view of the sensor. These primitive requirements are passed onto the skill as its requirements (at different states). To implement these ?nite-state machines required extending the Chimera recon?gurable software framework developed by Stewart et al [67]. This extension is described in the next section and then example skills are described with experimental results presented.

5.2 Chimera Agent Level
Implementing these event-driven skills required extensions to the Chimera real-time recon?gurable software environment (Figure 5-1). The Chimera SBS level provides an excellent framework for creating modular real-time software for implementing periodic task modules [66] and is an enabling technology for this research. Without the ability to construct modular, reusable real-time software, skills could not be ef?ciently composed. The periodic modules can be naturally combined into controllers which have a ‘continuous’ nature. The higher-level, “meta-control” of these modules was missing. Robot programming is inherently event-driven -- with different parts of the task program requiring different controllers, trajectory generators, and event detectors. The task strategy is segmented into different phases or states. This characterization points to the need for an asynchronous event-driven software level to complement the periodic SBS level. The SBS 96

5.2 CHIMERA AGENT LEVEL

Con?guration R a b C, math, and utility subroutine libraries d c special purpose processor F typed data in sensor interface X raw data in i/o device driver x typed data in sensor interface Y raw data in i/o device driver y typed data out actuator interface Z raw data out i/o device driver z

g to/from other subsystem

e

h

f

Chimera

from sensor X

from sensor Y

to actuator Z

Figure 5-1: Chimera Recon?gurable Software Framework level supports writing smaller, modular pieces of real-time code which can be reused -- this has the advantages of keeping the modules small and easing their development. However, composing more complex functions requires effectively and quickly managing sets of these modules. The Chimera Agent level addresses the need for this “meta-control” of module con?gurations. Whereas the SBS level is fundamentally modelled after periodic, port-based agents which process data, the Agent level is based on asynchronous, event-driven agents which process events and data. And whereas SBS modules have effectively two operating states (ON/OFF), agents in general have multiple, user-speci?ed operating states. Generally an agent which manages n modules can have up to 2n states, though in practice the number is usually much smaller (e.g. the controller agent manages 15 modules but has only four states: off, joint, cartesian, and damping).

97

CHAPTER 5: ROBOT SKILLS

FSM’s 4th Level events

event processing

Agents events 3rd Level Modules control messages port-based data Figure 5-2: 3rd Level and 4th Level Interaction

data and event processing

data processing

SBS module management has several requirements. Encapsulating sets of these realtime modules into higher-level agents which can persist throughout an application provides the ability to hide agent complexity and details from its clients. Managing these module sets, or con?gurations, requires general-purpose computation to support complex decision making. Con?guration decisions may also require access to the SBS state variable table to read relevant data from the 3rd level. Collections of modules usually require the module parameters to be shared and/or coordinated and the Agent level supports this as well. The highestlevel programs must be quickly composable from existing primitives and modules without requiring a compile/debug cycle. The ?nite-state machine interpreter supports fast connection of events to con?guration commands and the graphical user interface facilitates construction of these state-machines. Two new objects implement the Agent level: agents and ?nite state-machines. Both are designed to be event-driven entities with multiple internal states. Their periodic components are executed as SBS port-based object modules and their higher-level, asynchronous coordination functions are executed in response to discrete events or commands. Agents are written in “C” code and can directly control SBS modules. State machines do not have general 98

5.2 CHIMERA AGENT LEVEL

Event-driven Agents

commands, data

events, data

Periodic Modules

....

Figure 5-3: Chimera Agent Level purpose computation capability -- they implement command and event transitions but they can reference SBS modules, agents, and other state machines and represent the ‘top-level’ of programs. One of the key ideas implemented in the agent level is that of object persistence. Certain agents, for example controllers, persist throughout a task strategy but require different states during different strategy states. Also, the modules which connect directly to hardware (i.e. sensor device driver interfaces) must be able to service multiple clients. The agent level handles this by implementing a connection scheme whereby an object can be connected to by multiple higher-level objects. For example, object “A” to instantiate object “B” and then if object “C” can connect to “B”. If object A is destroyed B will persist since C is connected to it. In the case of SBS modules, when an agent/fsm turns an SBS module “ON” it becomes its parent and assumes authority over the object control. Once an object relinquishes control, the parent is reset and another object is free to control the child object. The general-purpose computation ability available to agents means that complex algorithms can be implemented in them. Thus, these objects represent an extensible capability 99

CHAPTER 5: ROBOT SKILLS for constructing more complex, sophisticated agents. In particular, planning algorithms or more sophisticated error-recovery algorithms can be implemented as agents. Our controller agent provides joint, cartesian, and damping force control capabilities through shared SBS modules. Prior to the development of the Agent level, transitioning between different control modes was fraught with peril as very speci?c on/off sequences must be observed to prevent catastrophic behavior. In fact, early on, an accident caused by incorrect module sequencing during controller transition broke the gripper ?ngers when the gripper plunged into a hard surface. Dither/correlation agents which encapsulate orthogonal dithers for implementing a localized search/randomization coupled with event-detection have also been implemented as 4th-level agents. FSM’s are collections of states and transitions which move between states. Each state has two lists associated with it: a command list and an event list. The command list is an ordered list of control commands which can reference any type of object (module, agent, or fsm). FSM’s can be nested up to 32 levels deep. A command has the form:
type cmd object parameters

A command example would be “sbs on movedx L 0 1 0 0.01 0.02” The movedx is a simple differential move command which takes a code (L=linear move), an axis (0 1 0), a displacement (0.01m), and a speed (0.02 m/s). An fsm event has the form:
type object signal nextstate [exitcode]

The last ?eld is optional and only applies if the nextstate is ‘halt’ -- it indicates the “success” of the fsm termination. The signal is a 32-bit value which can be generated by any of the three types of objects. Generally, agents will trap events which their child modules generate and handle them. However, they may forward the event to their parent state machine for handling or generate a different event depending on the situation. 100

5.2 CHIMERA AGENT LEVEL Five default FSM states exist: create, start, halt, destroy, and reset (Figure 5-4). The user is free to de?ne additional states to compose an event-driven program. The command list is executed upon entry to the state. The event list de?nes the transitions away from the state as connected to speci?c events. Events may be generated by either SBS modules, agents, or other fsms. When entering the halt state, the fsm is effectively ‘?nished’ and informs its parent of the outcome through an exit code (1=success, 2=failure). Whatever state transitions into the halt state supplies the exit code (the default is 2). A graphical user interface, the Skill Programming Interface (SPI), written in tcl/tk is available to support the rapid composition of event-based programs. The user interface is described in more detail in the Appendix of this thesis. The output of the SPI is a text con?guration ?le which is loaded for realtime execution.
Create Destroy Start Reset Halt user states default states

Figure 5-4: FSM Default and User States

101

CHAPTER 5: ROBOT SKILLS

Figure 5-5: Skill Programming Graphical User Interface

5.3 Example Skills
5.3.1 Square Peg Insertion
No robotic manipulation thesis would be complete without a reference to the canonical peg-in-hole task. The task is a square peg insertion into a square cutout hole. The peg/hole clearance is not free -- the ?t is actually a slight press ?t and releasing the peg will not cause it to fall into the hole. The starting position is above the surface with the insertion axis approximately aligned. Two similar skills are implemented to perform this task and both assume approximately correct insertion axis alignment to start. Skill A also assumes rotational alignment about the insertion axis, while skill B allows error in this rotation. Both use force and vision feedback to execute the task. A guarded move down begins the strategy. After contact, corner features are selected on the two parts and a vision rotation move aligns the peg edge with the hole edge. Once rotational alignment is attained, a translation is exe102

5.3 EXAMPLE SKILLS cuted under visual feedback to bring the two part corners together. This translation is a combined force and vision move -- vision commands motion in the plane while force maintains contact with the surface. The projection visual servoing primitive is used to mitigate the disturbances caused by visually-driven motions on the force-control direction. Once the corners are brought close together (de?ned on the image plane as a ‘target’ visual error), the peg is tilted into the hole and a guarded move acquires the side contact. The peg is pushed against the hole side while slowly straightening and introducing a small rotational dither about the peg insertion axis. When the rotational error was initially zero the insertion was

Figure 5-6: Square and Triangular Peg Insertion Tasks much easier. With rotational error, the error was not always completely removed due to noise in the feature tracking -- the error might still be a few degrees. This was signi?cant enough to prevent mating. So for skill B, additional dithering was introduced both when straightening and during insertion to help correct this additional error.

Y X Z Figure 5-7: Block Task Frame Assignment

103

CHAPTER 5: ROBOT SKILLS

create

destroy

reset

start

halt push_in

touch

vision contact vismove tilt

straighten

vis_rotate

Figure 5-8: Block Insertion Skill

Linear Velocity 0.01 0.005 0 ?0.005 ?0.01 0

visual move feature acquisition
x y z

speed (m/s)

gmove
10 20 30 time (sec) Angular Velocity 40 50

0.03 0.02 x y z

speed (rad/s)

0.01 0 ?0.01 ?0.02 ?0.03 0 10 20 time (sec) 30 40 50

Figure 5-9: Block Skill “A” Velocities

104

5.3 EXAMPLE SKILLS

Contact Force 10 8

low contact forces during visual move
x y z

Force (N)

6 4 2 0 ?2 0 10 20 30 time (sec) Contact Torque 40 50

0.1 0.05

Torque (Nm)

0 ?0.05 ?0.1 ?0.15 0 10 20 time (sec) 30 40 50

x y z

Figure 5-10: Block Skill “A” Forces

Vision Errors 100 x error y error 50

Pixels

0 ?50 ?100 10

12

14

16

18 20 22 time (sec) Feature Tracking Confidence

24

26

28

100

Confidence (0?100)

80 60 40 20 0 10 12 14 16

poor tracking robustness
18 20 time (sec) 22 24

Goal Ref

26

28

Figure 5-11: Block Skill “A” Pixel errors and con?dence

105

CHAPTER 5: ROBOT SKILLS

Linear Velocity 0.04 0.03 x y z

speed (m/s)

0.02 0.01gmove 0 ?0.01 0 10 20 30 40 time (sec) Angular Velocity 50 60 70

feature acquisition

visual translation

0.4 0.2 0 ?0.2 ?0.4 0 x y z

speed (rad/s)

visual rotation

dithering
10 20 30 40 time (sec) 50 60 70

Figure 5-12: Block Skill “B” Velocities

Contact Force 8 6

low contact forces during visual moves

Force (N)

4 2 0 ?2 ?4 0 10 20 30 40 time (sec) Contact Torque 50 60 70

x y z

0.4 x y z

Torque (Nm)

0.2

0

?0.2 0

10

20

30 40 time (sec)

50

60

70

Figure 5-13: Block Skill “B” Contact Forces

106

5.3 EXAMPLE SKILLS

Vision Errors 100 50 x error y error

Pixels

0 ?50 ?100 ?150 0 10 20

rotation translation

30 40 50 time (sec) Feature Tracking Confidence

60

70

100

Confidence (0?100)

90 80 70 60 50 40 0 10 20 30 40 time (sec) 50 60 70 Goal Ref

Figure 5-14: Block Skill “B” Visual Error and Tracking Con?dence

Visual Rotation Error 10 0

Error (deg)

?10 ?20 ?30 15

20

25

30 35 time (sec)

40

45

50

Figure 5-15: Block Skill “B” Rotation Error

5.3.2 Triangular Peg Insertion
The triangular peg is shown in Figure 5-6 along with the square peg. The triangular peg insertion was made easier by slightly enlarging the hole with a ?le and producing a chamfer on the hole. This gave a couple of millimeters of clearance instead of the slight press ?t for

107

CHAPTER 5: ROBOT SKILLS the square peg task. We used the exact same skill program for this task as the square peg task.

Y X Z Figure 5-16: Triangular Peg Task

Linear Velocity 0.01 0.005 x 0 ?0.005 ?0.01 5 y z

speed (m/s)

10

15

20

25 30 time (sec) Angular Velocity

35

40

45

50

0.4 0.2 x 0 ?0.2 ?0.4 5 y z

speed (rad/s)

10

15

20

25 30 time (sec)

35

40

45

50

Figure 5-17: Triangular Peg Velocities

108

5.3 EXAMPLE SKILLS

Contact Force 10 8

Force (N)

6 4 2 0 ?2 5 10 15 20 25 30 time (sec) Contact Torque 35 40 45 50

x y z

0.1 0

Torque (Nm)

?0.1 ?0.2 ?0.3 ?0.4 5 10 15 20 25 30 time (sec) 35 40 45 50

x y z

Figure 5-18: Triangular Peg Forces

Vision Errors 100 80 60

Pixels

x error y error

40 20 0 ?20 0 10 30 time (sec) Feature Tracking Confidence 20 40 50

100

Confidence (0?100)

80 60 40 20 5

Goal Ref

10

15

20

25 30 time (sec)

35

40

45

50

Figure 5-19: Triangular Peg Visual Errors and Tracking Con?dence

109

CHAPTER 5: ROBOT SKILLS

Visual Rotation Error 20 10

Error (deg)

0 ?10 ?20 15

20

25 time (sec)

30

35

40

Figure 5-20: Triangular Peg Visual Rotation Error

The peg insertion tasks demonstrate the use of the combined force and vision primitive for constraining all three translation DOF. The next tasks are all connector insertion tasks. The task strategy implemented by primitives results in a command velocity, Vcmd, which is perturbed by the accommodation controller in response to contact. The perturbed velocity, Vref, is used to generate joint setpoints for the robot joint controller. The experimental results for the BNC and D connector tasks are shown as plots of Vref. Before presenting the results of the connector insertion tasks, grasping results from using decomposition in the visual servoing control law to control 3 translation DOF for approaching and grasping a connector are presented. This vision-driven grasp shown in Figure 5-21 was used as part of the connector insertion strategies. The feature acquisition and placement was manually performed for this experiment and SSD feature tracking was used. Two cameras placed approximately orthogonally are used to track features on the parts -one camera is used to drive two DOF, and the second camera controls the “depth” DOF. There are three distinct phases visible in the grasp: the transport, the approach, and the depart. The transport involves visual servoing along two directions, but not the approach direction to align the gripper above the part. The approach involves mainly a straight-line motion along the approach direction, but all three translations are visually servoed to compensate for calibration-induced errors. Finally, the part is grasped and an open-loop depart move is executed to move away from the table. 110

5.3 EXAMPLE SKILLS

Y Z
Velocity (m/s)
0.02

X

0.015

visually servo in X and Z only

visually servo in X, Y, and Z. calibration error compensation

pick up connector

0.01

Vx Vz Vy
4 6 8 10 12

0.005

0

-0.005

gripper closes here
14 16 18 20 22

-0.01

Figure 5-21: Vision-Guided Grasp
time (sec)

5.3.3 BNC Connector Insertion

Figure 5-22: BNC Connector Task

This task is a standard BNC connector insertion. The BNC connector has a very different geometry than the D-connector, but the same primitives are used to implement a task strategy. The BNC insertion strategy has the same three basic steps as the D-connector insertion: 1) grasp connector, 2) transport to the mating connector, and 3) insertion. The grasp and transport phases are essentially the same as those for the D-connector. However, the insertion phase in this case is a more complex, multi-state event, and force-sensing is used to trigger the transitions. The ?rst part of the insertion step is the guarded move followed by dithering (and correlation) to acquire the ?rst constraint: no translation in the plane de?ned 111

CHAPTER 5: ROBOT SKILLS by the insertion axis. Once this constraint has been acquired, the holes in the bayonet must be aligned with the stubs on the connector shaft. The movedx primitive performs the rotation while the stick primitive maintains contact and monitors for movement along the insertion axis. Once movement along the insertion axis occurs, we know that the connector has mated with the stubs so we terminate the rotation. Another rotation locks the bayonet and ?nally the connector is released. The stick primitive in the D-connector task was used to detect failure since the primary failure mode was losing contact. Here the same primitive is used to signal a task state transition when the bayonet mates with the shaft stubs.

gmove vis_xz vis_xyz grip movedx vis_xyz ldither correlate movedx stick ldither stick

movedx grip movedx Insert

Grasp

Transport

Figure 5-23: BNC Insertion Strategy

Vel (m/s; rad/s)
0.02

grasp
0.015 0.01 0.005 0 -0.005 -0.01 10 10 20 20

transport Vy

insert ωy/15

Vx

Vz time (sec)
30 30 40 40 50 50

Figure 5-24: BNC Connector Insertion Results

112

5.3 EXAMPLE SKILLS

Velocity (m/s; rad/s)
0.02

end of transport phase dithering

correlation threshold tripped rotation to mate bayonet stubs gripper opened

depart move

0.01

0

rotation to lock bayonet
-0.01 28 30 32 34 36 38 40 42 44 46 48 50 52 54

guarded move

contact

movement along -Y

time (sec)

Figure 5-25: BNC Insertion Stage

5.3.4 D-connector insertions

Figure 5-26: 25 and 9 pin D-connector Tasks

This task is a basic 25-pin D-connector insertion. Given the small scale of the contact, we cannot reasonably derive a strategy based on a detailed-contact analysis. Instead, a heuristic strategy was developed based on the available command set and sensing. The strategy is based on the available command primitives (some sensor-driven, some not) and is shown as a ?nite-state machine (FSM) in Figure 5-27. The basic strategy has 3 steps: 1) grasp the connector, 2) transport to the mating connector, and 3) perform the insertion. The ?rst two steps are dominated by vision-feedback; the third step is dominated by force feedback. The ?rst step, grasping, relies on approximate angular alignment of the connector axes (X, Z) with the camera optical axes. Visual setpoints are identi?ed in the images and controlled through visual feedback. The second step also involves using visual feedback to position the grasped connectors above the mating connector for insertion. The insertion step involves a guarded move, followed by a mixture of “sticking” along with rotational and linear sinusoi113

CHAPTER 5: ROBOT SKILLS dal dithering (different frequencies) and correlation monitoring of the linear dithering. The dithering introduces enough variation in the command to resolve small uncertainties left over from initial positioning. The correlation of the commanded and force-perturbed reference velocities provides a means to reliably detect when the connector has seated. Note that this “success-detection” method does not rely on attaining absolute position goals, but rather on attaining particular motion constraints.
vis_xz vis_xyz grip movedx Grasp Transport vis_xyz gmove ldither correlate grip movedx Insert stick rdither

Figure 5-27: D-connector Insertion Strategy

velocity (m/s; rad/s)
0.02

grasp

transport

insert ωy/15

0.01

0

Vz Vx
15

Vy
20 25 30 35 40 45

-0.01

Figure 5-28: D-connector Insertion Results

time (sec)

Velocity (m/s)
0.02

dithering
0.01

end of transport phase

depart move

0

-0.01 35 36 37 38 39 40 41

gripper opened correlation threshold tripped
42 43 44

guarded move

contact

time (sec)

Figure 5-29: D-connector Insertion Stage

114

5.3 EXAMPLE SKILLS

5.3.5 Press-?t Connector

Figure 5-30: Military-style Press Fit Connector

create reset

destroy

start

halt

release rotate

touch rotate_back relax mate

press

Figure 5-31: Press-?t Connector Skill

This strategy was performed without visual servoing feedback because feature tracking was especially dif?cult for this task because the threads on the female connector add texture which is not allowed by our feature tracker. Another ?ducial mark could be added to improve this. The basic strategy is to tilt the peg by 5 degrees, and move to acquire the contact. Then, while applying a sticking force in Y and Z, rotate slowly to vertical to trap the peg inside the hole. At this point, the peg is resting on top of the ridge inside the hole which is only a mm or so from the top surface. At this point a rotational guarded move is executed 115

CHAPTER 5: ROBOT SKILLS to line up the slot. Then the peg is pushed down with 40N of force to accomplish the press ?t. We found it necessary to introduce a fast dither in Z to ease the insertion. The values of the parameters are selected by the human and are fairly arbitrary. One of the extensions of this research is to optimize the parameter values to maximize robustness and/or performance.

Linear Velocity 0.02 0.01

speed (m/s)

0 ?0.01 ?0.02 ?0.03 ?0.04 4 6 8 10 12 14 time (sec) Angular Velocity 16 18 20

x y z

0.1 0 ?0.1 ?0.2 ?0.3 4

speed (rad/s)

x y z

6

8

10

12 time (sec)

14

16

18

20

Figure 5-32: Press Fit Connector Velocities

116

5.3 EXAMPLE SKILLS

Force Command 40 35 30 25 x

Force (N)

20 15 10 5 0 ?5 4

y

z

6

8

10

12 time (sec)

14

16

18

20

Y distance 0

?0.005

?0.01

Y Movement (m)

?0.015

?0.02

?0.025

?0.03 4

6

8

10

12 time (sec)

14

16

18

20

Figure 5-33: Press Fit Connector Force Command and Y Movement

Contact Force 60 40

Force (N)

20 0 ?20 4

x y z

6

8

10

12 14 time (sec) Contact Torque

16

18

20

0.6 0.4

Torque (Nm)

0.2 0 ?0.2 ?0.4 4 6 8 10 12 time (sec) 14 16 18 20

x y z

Figure 5-34: Press Fit Connector Forces 117

CHAPTER 5: ROBOT SKILLS

5.4 Summary
This chapter presented a ?nite-state machine representation of a sensor-based skill. Each part of the skill must specify a controller, a set of trajectories (in force and/or velocity) and event detectors to force state transitions. The Chimera Agent 4th Level was introduced for constructing event-driven real-time programs. The Agent level supports objects which process only events (fsm’s) and objects which can process both events and data (agents). Agents provide a bridge between the fast data processing occurring on the 3rd level through periodic, port-based modules and the asynchronous event-based state machines in the 4th level. Six different, but related tasks demonstrated experimental solutions with shared sensorbased primitives based on damping force feedback and visual servoing. The strategies are differentially speci?ed -- no absolute positions are used as cues for execution on termination. This allows the skills to be used wherever a higher-level program can initialize them appropriately relative to the task. This differential nature of skill de?nition is contrasted with some other skill-building efforts which use absolute position data as input and output -- for example Gullapalli’s peg insertion skills [23]. Philosophically the idea is to get away from specifying tasks in terms of absolute robot movement and toward specifying tasks in terms of the robot’s view of them -- their projection onto the robot’s sensors. The weakest part of the skill execution is the vision integration -- feature acquisition and tracking. Currently the feature selection is done manually which obviously limits the usefulness of a skill. In the next chapter a very preliminary approach to automatic feature acquisition is presented. More serious is the (relative) lack of robustness in feature tracking. The corner feature tracker is more robust than SSD for tracking occluding boundaries, but the con?dence measures clearly decline when the features are brought close together. What is needed are more sophisticated trackers which incorporate task model information to provide additional constraints for robustness. This brings up two problems: 1) making the trackers too task-speci?c, and 2) increasing the computation time. The obvious ?rst step is to key on ?ducials which are designed into the task(s). Keying on ‘natural’ task features is more chal118

5.4 SUMMARY lenging but recurring features like edges, vertices, and surfaces can be used. One advantage of ?ducials is that both geometric projection and photometric effects can be addressed. The photometric effects are extremely signi?cant and often ignored [19]. Specularity, shadows, and other real-world imaging phenomena caused signi?cant problems with simple feature tracking algorithms. However, ?ducials are usually internal targets which necessarily introduce offsets -- these offsets introduce additional

赞助商链接
相关文章:
更多相关标签: