Course syllabus
(4923) PROGRAMACIÓN AVANZADA DE ARQUITECTURAS MULTINÚCLEO

Academic term 2024/2025

Spanish course syllabus

  1. Identification
    1. About the course
    2. Academic Term
      2024/2025
      Degree
      MÁSTER UNIVERSITARIO EN NUEVAS TECNOLOGÍAS EN INFORMÁTICA
      Course
      PROGRAMACIÓN AVANZADA DE ARQUITECTURAS MULTINÚCLEO
      Code
      4923
      Year
      PRIMERO
      Course type
      OPTATIVA
      Number of groups
      1
      ECTS
      6.0
      Estimation of workload
      150.0
      Timeline
      2º Cuatrimestre
      Languages
      Spanish
      Academic Term 2024/2025
      Degree

      MÁSTER UNIVERSITARIO EN NUEVAS TECNOLOGÍAS EN INFORMÁTICA

      Course PROGRAMACIÓN AVANZADA DE ARQUITECTURAS MULTINÚCLEO
      Code 4923
      Year PRIMERO
      Course type OPTATIVA
      Number of groups 1
      ECTS 6.0
      Estimation of workload 150.0
      Timeline 2º Cuatrimestre
      Languages Spanish

    3. Teaching staff
      • JIMBOREAN, ALEXANDRA Professor: GRUPO 1 Group coordination: GRUPO 1 Course coordinator

        Category

        INVESTIGADOR/A "RAMON Y CAJAL"

        Area

        ARQUITECTURA Y TECNOLOGÍA DE COMPUTADORES

        Department

        INGENIERÍA Y TECNOLOGÍA DE COMPUTADORES

        Email / Personal web page / Online tutoring sessions

        alexandra.jimborean@um.es https://webs.um.es/alexandra.jimborean/ Online tutoring sessions:

        Phone number and office hours

        Duration:
        A
        Day:
        Miércoles
        Hours:
        11:00-14:00
        Place:
        There are no records
        Remarks:
        There are no records
      • RAVINDRANATH REDDY, RAVIKIRAN Professor: GRUPO 1 Group coordination:

        Category

        INVESTIGADOR/A PREDOCTORAL

        Area

        There are no records

        Department

        INGENIERÍA Y TECNOLOGÍA DE COMPUTADORES

        Email / Personal web page / Online tutoring sessions

        ravikiran.r.r@um.es Online tutoring sessions: No

        Phone number and office hours

      • SILLARS, EMILY MELISSA Professor: GRUPO 1 Group coordination:

        Category

        CONTRATADO/A PREDOCTORAL (FPU INVES-UM)

        Area

        There are no records

        Department

        There are no records

        Email / Personal web page / Online tutoring sessions

        emily.sillars@um.es Online tutoring sessions: No

        Phone number and office hours

  2. Presentation
  3. Today, multicore processors are present in all market segments, from embedded systems (for example, the CPUs of the PlayStation3 or XBox 360 consoles) to supercomputers (for example, the chips used to build BlueGene / L or BlueGene / P), to personal computers (for example, Intel Core 2 Duo / Quad or i7 processors, or AMD Athlon 64 X2 or Phenom X3 / 4) and servers (for example, the range of Intel Xeon or AMD Opteron processors). The course gives an introduction to parallel programming and hardware Several parallel programming frameworks are presented and their advantages and disadvantages are discussed. The main objective of this course is for the students to familiarize themselves with the architecture, system software and programming techniques and tools of the most common multicore processor-based systems used today. Using a combination of flipped-classroom and in-class lecturing, in this course we will present an overview of several frameworks and techniques for parallel programming:

    1 Threads

    2 OpenMP

    3 SIMD

    4 OpenCL/CUDA

    5 Heterogeneous computing

    You are not expected to become experts in each of these frameworks at the end of the course but to be able to understand the advantages and disadvantages of the models and how to approach a project that requires parallelism.

    We will address issues such as:

    When is it possible to parallelize a sequential algorithm? When is it harmful / beneficial to enable parallelization? How to parallelize the code correctly? Which parallelization approach is the most efficient?

    You will get hands-on experience through programming assignments, applying parallelization techniques to parallelize a sequential algorithm for a Crowd Simulation with Collision Avoidance.

    The course is a continuation of Multicore Architecture Programming.

  4. Conditions of access to the course
    1. Incompatibilities
    2. There are no records

    3. Requirements
    4. There are no records

    5. Recommendations
    6. It is recommended to have passed the courses "Parallel Programming and High Performance Computing", "Advanced Aspects in General Purpose Multicore Architectures" and "Multicore Architecture Programming".

      In addition, it would be convenient for the student to have general knowledge of concurrent programming and programming in C++.

  5. Competencies
    1. Basic competencies
      • CB6: Poseer y comprender conocimientos que aporten una base u oportunidad de ser originales en el desarrollo y/o aplicación de ideas, a menudo en un contexto de investigación
      • CB7: Que los estudiantes sepan aplicar los conocimientos adquiridos y su capacidad de resolución de problemas en entornos nuevos o poco conocidos dentro de contextos más amplios (o multidisciplinares) relacionados con su área de estudio
      • CB8: Que los estudiantes sean capaces de integrar conocimientos y enfrentarse a la complejidad de formular juicios a partir de una información que, siendo incompleta o limitada, incluya reflexiones sobre las responsabilidades sociales y éticas vinculadas a la aplicación de sus conocimientos y juicios
      • CB9: Que los estudiantes sepan comunicar sus conclusiones y los conocimientos y razones últimas que las sustentan a públicos especializados y no especializados de un modo claro y sin ambigüedades
      • CB10: Que los estudiantes posean las habilidades de aprendizaje que les permitan continuar estudiando de un modo que habrá de ser en gran medida autodirigido o autónomo.

    2. Degree competencies
      • CGT1: Capacidad para comprender y aplicar métodos y técnicas de investigación en el ámbito de la Ingeniería Informática.
      • CGT2: Capacidad para proyectar, calcular y diseñar productos, procesos e instalaciones en todos los ámbitos de la ingeniería informática.
      • CET3: Capacidad para integrar los conocimientos adquiridos y aplicarlos al resolver problemas en entornos nuevos o poco conocidos dentro de contextos más amplios y multidisciplinares

    3. Transversal and course competencies
      • CAA6 Capacidad para analizar, diseñar, desarrollar, depurar y optimizar aplicaciones paralelas explotando el modelo de programación y la arquitectura subyacentes
      • CAA5 Capacidad para identificar dado un problema sus necesidades computacionales y las técnicas computacionales de altas prestaciones más apropiadas para su resolución
      • CAA7 Capacidad para utilizar y desarrollar metodologías, métodos y técnicas de investigación en los campos de las Arquitecturas de Altas Prestaciones y de la Supercomputación, siendo capaces de innovar

  6. Contents
    1. Theoretical contents
    2. Theme 1: Introduction to parallel architectures and parallel programming.

      Theme 2: Programming with threads and OpenMP

      Theme 3: Vectorization: Single Instruction Multiple Data (SIMD)

      Theme 4: Programming in CUDA/OpenCL

      Theme 5: Synchronization mechanisms

      Theme 6: Lock-free programming

      Theme 7: CUDA memories

      Theme 8: Heterogeneous programming

      Theme 9: Asynchronous Heterogeneous Programming

    3. Practical contents
      • Practical activity 1: Set-up and introduction

        During the first lab (Lab 0) will present the Crowd Simulation project and guide you through the code base. For your convenience, all the required software has been installed on the computers in the lab. Test that all frameworks are correctly installed and the code compiles and runs correctly.

        The Crowd Simulation project will be divided in several assignments on which you will be working ingroups of two/three,concluded by a final presentation, at the end of the course.

        You will start working together already during Lab 0 on Assignemnt 1 (A1) to transform the sequential implementation into a parallel version, using threads and OpenMP.

        Related to:
        • Theme 1: Introduction to parallel architectures and parallel programming.
        • Theme 2: Programming with threads and OpenMP
      • Practical activity 2: Finding Bottlenecks and Parallelizing with PThreads and OpenMP

        Present A1: parallelizing parts of the Crowd Simulation application using dedicated libraries (threads and OpenMP).

        Start working on A2: parallelizing parts of the Crowd Simulation application using basic techniques for vectorization (SIMD) for CPUs and CUDA/OpenCL for GPUs.

        Related to:
        • Theme 2: Programming with threads and OpenMP
        • Theme 3: Vectorization: Single Instruction Multiple Data (SIMD)
        • Theme 4: Programming in CUDA/OpenCL
      • Practical activity 3: Parallelization with SIMD and CUDA/OpenCL

        Assignment A2: parallelizing parts of the Crowd Simulation application using basic techniques for vectorization (SIMD) for CPUs and CUDA for GPUs.

        Related to:
        • Theme 3: Vectorization: Single Instruction Multiple Data (SIMD)
        • Theme 4: Programming in CUDA/OpenCL
      • Practical activity 4: Parallelization with SIMD and CUDA/OpenCL (2)

        Present A2.

        Start working on Assignment A3 on avoiding colisions using synchronization mechanisms.

        Related to:
        • Theme 3: Vectorization: Single Instruction Multiple Data (SIMD)
        • Theme 4: Programming in CUDA/OpenCL
      • Practical activity 5: Task-based parallelization, synchronization, lock-free

        Assignment A3: Implement a Colision Avoidance algorithm for the Crowd Simulation, using tasks and dedicated synchronization constructs. Bonus points can be obtained for a lock-free version of the implementation.

        Related to:
        • Theme 5: Synchronization mechanisms
        • Theme 6: Lock-free programming
      • Practical activity 6: Task-based parallelization, synchronization, lock-free (2)

        Present A3.

        Start working on A4 on combining CPU and GPU computing for the Crowd Simulation with Collision Avoidance from A3.

        Related to:
        • Theme 4: Programming in CUDA/OpenCL
        • Theme 5: Synchronization mechanisms
        • Theme 6: Lock-free programming
        • Theme 7: CUDA memories
        • Theme 8: Heterogeneous programming
        • Theme 9: Asynchronous Heterogeneous Programming
      • Practical activity 7: Heterogeneous computing

        Assignment A4: Parallelize selected parts of the algorithm using CUDA/OpenCL and run them on the GPU at the same time as the collision handling from A3 is being run on the CPU. Avoid race conditions.

        Related to:
        • Theme 4: Programming in CUDA/OpenCL
        • Theme 7: CUDA memories
        • Theme 8: Heterogeneous programming
        • Theme 9: Asynchronous Heterogeneous Programming
      • Practical activity 8: Heterogeneous computing (2)

        Present A4.

        Related to:
        • Theme 7: CUDA memories
        • Theme 8: Heterogeneous programming
        • Theme 9: Asynchronous Heterogeneous Programming
      • Practical activity 9: Present assignments

        Present A1, A2, A3 or A4 in case they have not been successfully passed in the dedicated lab (for instance, if the teacher provided feedback that had to be integrated for the assignment to be considered complete).

        Related to:
        • Theme 1: Introduction to parallel architectures and parallel programming.
        • Theme 2: Programming with threads and OpenMP
        • Theme 3: Vectorization: Single Instruction Multiple Data (SIMD)
        • Theme 4: Programming in CUDA/OpenCL
        • Theme 5: Synchronization mechanisms
        • Theme 6: Lock-free programming
        • Theme 7: CUDA memories
        • Theme 8: Heterogeneous programming
        • Theme 9: Asynchronous Heterogeneous Programming

  7. Training activities
  8. Training Activity Methodology Hours In-person
    A1: Actividades con grupo grande de alumnos entre las que se encuentran la presentación en el aula de los conceptos propios de la materia mediante metodología expositiva con lecciones magistrales participativas y medios audiovisuales. También se contemplan en este grupo las actividades de evaluación teórico prácticas.

    The theory part of the course will be explained by means of participative lectures and audiovisual media. In order for class participation to be effective, students are required to prepare prior to the classes and to participate to the exercises and open questions discussed in class.

    17.5 40.0
    A2: Actividades con grupo mediano en el aula de resolución de problemas, seminarios, charlas, ejercicios basados en el aprendizaje orientado a proyectos, estudios de casos, exposición y discusión de trabajos relativas al seguimiento individual y/o grupal de adquisición de las competencias.

    Theoretical and practical exercises will be carried out in class with the participation of the students. In addition, students will be required to carry out more exercises on their own. For this purpose, they will be provided with a list of exercises for each topic.

    12.5 12.5
    A3: Actividades con grupo pequeño en el laboratorio relacionadas con la componente práctica de las asignaturas, desarrollo de trabajos con equipo técnico especializado, desarrollo de programas, etc.

    During the labs, the students will work on theoretical-practical assignments in an autonomous way, guided and helped by the teacher.

    30.0 37.5
    A5: Estudio y trabajo autónomo orientado a la asimilación de contenidos, realización de problemas, ejercicios o redacción de informes técnicos o memorias descriptivas, desarrollo de proyectos o prácticas individuales o en grupo, preparación de exámenes, presentaciones y defensa de trabajos.

     Autonomous work of the student.

    90.0 0.0
    Total 150.00

  9. Course schedule
  10. https://www.um.es/web/estudios/masteres/tecnologias-informatica/2024-25#horarios

  11. Assessment systems
  12. Identifier Name of the assessment tool Assessment criteria Weighting
    IE1 Examen teórico-práctico: En este instrumento incluimos desde el tradicional examen escrito o tipo test hasta los exámenes basados en resolución de problemas, pasando por los de tipo mixto que incluyen cuestiones cortas o de desarrollo teórico junto con pequeños problemas. También se incluye aquí la consideración de la participación activa del alumno en clase, la entrega de ejercicios o realización de pequeños trabajos escritos y presentaciones.

    The evaluation is tailored around the course's learning outcomes. The students are expected to demonstrate they acquired the skills listed in the learning outcome and will be evaluated based on the Crowd Simulation project divided in four assignments (and solved in groups during the weekly labs) In addition to the presentation of each assignment during the labs, there will be a final oral presentation and a final report.

    At the end of the oral presentation of each team, the members of the team will be asked questions on how the topics taught in the course have been applied to parallelize the Crowd Simulation with Collision Avoidance.

    1 Construct parallel algorithms, ie, identify parallelism in a given algorithm, implement this parallelism, and identify factors that limit the parallelism in a program or algorithm.

    - Given a sequential algorithm, transform it into a parallel algorithm, exposing at least one of the parallelization opportunities: data parallelism, task parallelism, vectorization. Explain the choice of parallelization and reason why other forms of parallelism are not possible.

    - Identify all sources of parallelism, including combibations of data parallelism, task parallelism and vectorization.

    - Identify bottlnecks and factors that limit parallelism (eg dependences) and solutions to overcome such limitations (eg a new task scheduling).

    - Given a certain algorithm reason which of the parallel programming paradigms is the most suitable one, discussing the advantages and disadvantages of each parallelization method.

    2 Explain key issues of parallel programming, including data distribution, load balancing, locking and synchronization.

    - Ensure correctness of execution, by using appropriate synchronization methods.

    - Identify performance bottlenecks, such as data distribution and load balancing, and suggest solutions to reduce them; for example, finer or coarser granularity tasks, different scheduling policies.

    - Design, implement and evaluate such advanced solutions to reduce or alleviate the performance bottlenecks.

    - Reason about the characteristics of the algorithms that lead to such performance bottlenecks and which are optimal choices in terms of the parallel programming paradigms, granularity, distribution, synchronization, etc.

    3 Compare several parallel programming frameworks in terms of performance and efficiency of development;

    50.0
    IE2 Informe técnico: En este instrumento incluimos los resultados de actividades prácticas, o de laboratorio junto con sus memorias descriptivas, los resúmenes del estado del arte o memorias de investigación sobre temas concretos. Y la posibilidad de realizar entrevistas personales o presentaciones de los trabajos realizados también entran en esta categoría.

    The Crowd Simulation project is divided in four assignments, solved in teams during the weekly practice labs. Each of the four assignments will be demonstrated duing the labs In addition to correctness, the design of the proposed solution will also be considered. Each student of the team is expected to answer the teacher's questions and to demonstrate their participation in the solving / implementation of the assignment.

    50.0

  13. Exam dates
  14. https://www.um.es/web/estudios/masteres/tecnologias-informatica/2024-25#examenes

  15. Learning outcomes
  16. On completion of the course, the student should be able to:

    • Explain key issues of parallel programming, including data distribution, load balancing, locking and synchronisation, basic concurrent data structures.
    • Construct parallel algorithms, i.e., identify parallelism in a given algorithm, implement this parallelism, and identify factors that limit the parallelism in a program or algorithm.
    • Compare several parallel programming frameworks in terms of performance and efficiency of development.
    • Use several high-performance parallel programming frameworks and choose an appropriate framework under given circumstances such as computer architecture, application and efficiency.
    • Use parallel programming patterns to develop multithreaded applications on general-purpose multicore architectures and on GPUS.
    • Use common libraries in support for building multithreaded applications.
    • Debug multi-threaded applications with dedicated techniques and tools.

  17. Bibliography
  18. Group: GRUPO 1

    Basic bibliography

    There are no records

    Further reading

  19. Remarks
  20. Additional details about the Assessment

    The students can only be evaluated in the final presentation of the project if each of the four assignments have been presented and passed during the weekly labs.

    Sustainable Development Goals

    This subject is not directly linked to the Sustainable Development Goals.

    SPECIAL EDUCATIONAL NEEDS

    Those students with disabilities or special educational needs may contact the Service of Attention to Diversity and Volunteering (ADYV - https://www.um.es/adyv) to receive guidance on better use of their training process and, where appropriate, the adoption of measures of equalization and improvement for inclusion, under the Rectoral Resolution R-358/2016. The treatment of information about this student body, in compliance with the LOPD, is strictly confidential.

    STUDENT EVALUATION REGULATIONS

    Article 8.6 of the Student Evaluation Regulation (REVA) provides that "except in the case of activities defined as compulsory in the teaching guide, if the student is unable to follow the continuous evaluation process due to duly justified supervening circumstances, he/she shall be entitled to take a global test".

    It is also recalled that Article 22.1 of the Student Evaluation Regulations (REVA) stipulates that "the student who uses fraudulent conduct, including the improper attribution of identity or authorship, or is in possession of means or instruments that facilitate such conduct, will obtain a grade of zero in the evaluation procedure and, where appropriate, may be subject to sanction, after opening disciplinary proceedings".