CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 3

Your company stores user profile records in an OLTP database. You want to join these records with webserver logs. You have already ingested into the hadoop file system. What is the best way to obtain and ingest these user records?


Options are :

  • Ingest with SQL import
  • Ingest with hadoop streaming
  • Ingest using the HDFS put command
  • Ingest with flume agents
  • Ingest using hives LOAD DATA command
  • Ingest with Pigís Load Command

Answer : Ingest with flume agents

Identify the daemon that performs checkpoint operations of the namespace state in a cluster configured with HDFS High Availability (HA) using Quorum based-storage?


Options are :

  • JournalNode
  • NodeManeger
  • NameNode
  • BackupNode
  • Standby NameNode
  • Secondary NameNode
  • CheckpointNode

Answer : Secondary NameNode

In the execution of a MapReduce job, where does the mapper place the intermediate data in each map task?


Options are :

  • The hadoop framework holds the intermediate data in the task trackers memory until it is transferred to the reducers
  • The mapper transfers the intermediate data to the jobtracker, which then sends it to the reducers
  • The mapper transfer the intermediate data immediately to the reducers as it is generated by map task.
  • The mapper stores the intermediate data on the underlying filesystem of the local disk of the machine which ran the map task

Answer : The mapper stores the intermediate data on the underlying filesystem of the local disk of the machine which ran the map task

CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 3

You have cluster running with the FIFO Scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to cluster, which you expect to run a couple of minutes only. You submit both jobs with the same priority. Which two best describes how the FIFO Scheduler arbitrates the cluster resources for a job and its tasks?


Options are :

  • The order of execution of tasks within a job may vary.
  • Given Jobs A and B submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B.
  • The FIFO Scheduler will give, on average, equal share of the cluster resources over the job lifecycle.
  • The FIFO Schedule will pass an exception back to the client when Job B is submitted, since all slots on the cluster are in use.
  • Because there is more then a single job on the cluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time.
  • B,C
  • Tasks are scheduled in the order of their jobs' submission.

Answer : B,C

When requesting a file, how does HDFS retrieves the blocks associated with that file?


Options are :

  • the namenode requires the datanode for the block IDís
  • The namenode reads the block IDís from memory
  • the namenode reads the block IDís from disk
  • the client polls the datanodes for the block IDís

Answer : the namenode reads the block IDís from disk

What happens if a Mapper on one node goes into an infinite loop while running a MapReduce job?


Options are :

  • The Mapper will run indefinitely; the TaskTracker must be restarted to kill it
  • After a period of time, the JobTracker will restart the TaskTracker on the node on which the map task is running
  • The job will immediately fail.
  • After a period of time, the TaskTracker will kill the Map Task.

Answer : After a period of time, the TaskTracker will kill the Map Task.

CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 2

What determines the number of Reduces that run a given MapReduce job on a cluster running MapReduce v1 (MRv1)?


Options are :

  • It is set by the JobTracker based on the amount of intermediate data.
  • It is set by the developer.
  • It is set by the Hadoop framework and is based on the number of InputSplits of the job.
  • It is set and fixed by the cluster administrator in mapred-site.xml. The number set always run for any submitted job.

Answer : It is set by the developer.

For each job, the Hadoop framework generates task log files. Where are Hadoop's task log files stored?


Options are :

  • On the local disk of the slave node running the task.
  • In HDFS, in the directory of the user who generates the job.
  • Cached on the local disk of the slave node running the task, then purged immediately upon task completion.
  • Cached on the local disk of the slave node running the task, then copied into HDFS.

Answer : On the local disk of the slave node running the task.

What happens when a map task crashes while running a mapreduce job?


Options are :

  • The job immediately fails
  • The jobtracker attempts to re-run the task on the same node
  • the jobtracket attempts to re-run the task on a different node
  • The tasktracker closes the JVM instance and restarts

Answer : the jobtracket attempts to re-run the task on a different node

You configure Hadoop cluster with both MapReduce frameworks, MapReduce v1 (MRv1) and MapReduce v2 (MRv2/YARN). Which two MapReduce (computational) daemons do you need to configure to run on your master nodes?


Options are :

  • Node Manager
  • Application Master
  • Resource Manager
  • Journal Node
  • A,B
  • JobTracker

Answer : A,B

Which two occur when individual blocks are returned to DATANODE on a cluster local filesystem?


Options are :

  • The datanode runs a block scanner datablock scanner to verify the return blocks.
  • A metadata file is written to the datanode containing the checksums for each block
  • The datanode writes a metadata file with the name of the file the block is associated with
  • A metadata file is return to the datanode containing all the other node locations in the namespace
  • The datanode updates itís log of checkum verification
  • B,E

Answer : B,E

Your running a hadoop cluster with a name node on the host mynamenode. What are two ways you can determine available HDFS space in your cluster?


Options are :

  • Run hadoop fs -du / and locate the dfs remaining value
  • Connect to http://mynamenode:50070/and locate the dfs remaining value
  • Run hadoop dfsadmin Ė report and locate the DFS remaming value
  • A,C
  • Run hadoop DFSAdmin -spaceQuota and subtract DFS used % from configured capacity

Answer : A,C

Your developers request that you enable them to use pig on your hadoop cluster. What do you need to configure and / or install?


Options are :

  • Install the pig interpreter on the client machines only.
  • Install the pig interpreter on the master node which is running the jobtracker
  • Install the pig jars on the all slave nodes in the cluster the pig interpreter on the client machines
  • Install the pig interpreter on all nodes in the cluster, and the client machines

Answer : Install the pig interpreter on the client machines only.

Identify two features/issues that MapReduce v2 (MRv2/YARN) is designed to address:


Options are :

  • Single point of failure in the NameNode.
  • Standardize on a single MapReduce API.
  • HDFS latency.
  • A,C
  • Ability to run frameworks other than MapReduce, such as MPI.
  • Resource pressure on the JobTrackr
  • Reduce complexity of the MapReduce APIs.

Answer : A,C

You configure your hadoop cluster with mapreduce V1 (MRv1) along with HDFS high availability (HA) Quorum-based storage. On which nodes should you configure and run your journal node daemon(s) to guarantee a quorum?


Options are :

  • Jobtracker
  • Namenode and standby namenode
  • On each datanode
  • standby namenode
  • standby namenode, jobtracker, resourcemanager
  • Namenode, standby namenode and jobtracker
  • Namenode

Answer : standby namenode, jobtracker, resourcemanager

Your developers request that you enable them to use Hive on your Hadoop cluster. What do install and/or configure?


Options are :

  • Install the Hive interpreter on the client machines only, and configure a shared remote Hive Metastore.
  • Install the Hive Interpreter on the client machines and all the slave nodes, and configure a shared remote Hive Metastore.
  • Install the Hive interpreter on the client machines and all nodes on the cluster
  • Install the Hive interpreter on the master node running the JobTracker, and configure a shared remote Hive Metastore.

Answer : Install the Hive interpreter on the client machines only, and configure a shared remote Hive Metastore.

Your existing Hadoop cluster has 30 slave nodes, each of which has 4 x 2T hard drives. You plan to add another 10 nodes. How much disk space can your new nodes contain?


Options are :

  • The new nodes can contain any amount of disk space
  • The new nodes cannot contain more than 8TB of disk space
  • The new nodes must all contain 8TB of disk space, but it does not matter how the disks are configured
  • The new nodes must all contain 4 x 2TB hard drives

Answer : The new nodes can contain any amount of disk space

How does HDFS Federation help HDFS Scale horizontally?


Options are :

  • HDFS Federation reduces the load on any single NameNode by using the multiple, independent NameNode to manage individual pars of the filesystem namespace.
  • HDFS Federation provides cross-data center (non-local) support for HDFS, allowing a cluster administrator to split the Block Storage outside the local cluster.
  • HDFS Federation improves the resiliency of HDFS in the face of network issues by removing the NameNode as a single-point-of-failure.
  • HDFS Federation allows the Standby NameNode to automatically resume the services of an active NameNode.

Answer : HDFS Federation reduces the load on any single NameNode by using the multiple, independent NameNode to manage individual pars of the filesystem namespace.

Once a client application validates its identity and is granted access to file in a cluster. What is the reminder of read path back to the client?


Options are :

  • Here is how a client RPC request to the Hadoop HDFS NameNode flows through the NameNode. The Hadoop NameNode receives requests from HDFS clients in the form of Hadoop RPC requests over a TCP connection. Typical client requests include mkdir, getBlockLocations, create file, etc. Remember HDFS separates metadata from actual file data, and that the NameNode is the metadata server. Hence, these requests are pure metadata requests bno data transfer is involved
  • The namenode gives the client the block IDs and a list of Datanodes on which those blocks are found. And the application reads the block directly from the datanodes
  • All of the above
  • the namenode maps the read request against the block locations in its stored metadata and reads those blocks from the datanodes. The client applications then read the block from the namenode
  • The namenode maps the read request against the block locations in its stored metadata the block ids are stored by their distance to the client and moved to the datanode closest to the client according to hadoop rac topology. The client application then reads the blocks from the single datanode

Answer : The namenode gives the client the block IDs and a list of Datanodes on which those blocks are found. And the application reads the block directly from the datanodes

Which two features does Kerberos security add to a Hadoop cluster?


Options are :

  • A,D
  • Root access to the cluster for users hdfs and mapred but non-root acess clients
  • Authentication for user access to the cluster against a central server
  • Encryption for data on disk ("at rest")
  • Encryption on all remote procedure calls (RPCs)
  • User authentication on all remote procedure calls (RPcs)

Answer : A,D

What action occurs automatically on a cluster when a DataNode is marked as dead?


Options are :

  • The replication factor of the files which had blocks stored on the dead DataNode is temporarily reduced, until the dead DataNode is recovered and returned to the cluster.
  • The next time a client submits job that requires blocks from the dead DataNode, the JobTracker receives no heart beats from the DataNode. The JobTracker tells the NameNode that the DataNode is dead, which triggers block re-replication on the cluster.
  • The NameNode forces re-replication of all the blocks which were stored on the dead DataNode.
  • The NameNode informs the client which write the blocks that are no longer available; the client then re-writes the blocks to a different DataNode.

Answer : The NameNode forces re-replication of all the blocks which were stored on the dead DataNode.

What are the permissions of a file in HDFS with the following:rw-rw-r-x?


Options are :

  • the owner and group cannot delete the file, but other can
  • HDFS runs in user space which makes all users with access to the namespace able to read, write and modify all files
  • the owner and group can read the file other canít
  • the owner and group can modify the contents of the file other canít
  • No one can modify the content of the file

Answer : the owner and group can modify the contents of the file other canít

You has a cluster running with the Fail Scheduler enabled. There are currently no jobs running on the cluster you submit a job A, so that only job A is running on the cluster. A while later, you submit job B. Now job A and Job B are running on the cluster al the same time. How will the Fair' Scheduler handle these two Jobs?


Options are :

  • When job B gets submitted, Job A has to finish first, before job it can get scheduled.
  • When job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.
  • When job A gets submitted, it consumes all the task slot
  • When job A gets submitted, it doesn't consume all the task slot

Answer : When job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.

Which two updates occur when a client application opens a stream to begin a file write on a cluster running MapReduce v1 (MRv1)?


Options are :

  • The metadata in RAM on the NameNode is updated.
  • The change is written to the edits file.
  • Once the write stream closes on the DataNode, the DataNode immediately initiates a black report to the NameNode.
  • The metadata in the RAM on the NameNode is flushed to disk.
  • The metadata in RAM on the NameNode is flushed disk.
  • E,F
  • The change is written to the NameNode disk.

Answer : E,F

For a MapReduce job, on a cluster running MapReduce v1 (MRv1), whats the relationship between tasks and task templates?


Options are :

  • The developer sets the number of task attempts on job submission.
  • There are always exactly as many task attempts as there are tasks.
  • There are always at least as many task attempts as there are tasks.
  • There are always at most as many tasks attempts as there are tasks.

Answer : There are always at least as many task attempts as there are tasks.

You have a cluster running with the fair scheduler enabled and configured. You submit multiple jobs to the cluster. Each job is assigned to a pool. What are the two key points to remember about how jobs are scheduled with the fair scheduler?


Options are :

  • Each pool gets 1/N of the total available tasks slots, where N is the no of jobs running on the cluster
  • Each pools share of task slots may change throughout the course of job execution
  • Each pool gets 1/M of the total available tasks slots, where M is the no. of nodes in the cluster
  • Pools are assigned priorites.pools with higher priorities an executed b4 pools with lower priorities
  • D,F
  • Each pools share of the tasks slots remains static within the execution of any individual job
  • Pools get a dynamically-allocated share of the available task slots (subject to additional constraints)

Answer : D,F

How does the NameNode know DataNodes are available on a cluster running MapReduce v1 (MRv1)


Options are :

  • The NameNode send a broadcast across the network when it first starts, and DataNodes respond.
  • DataNodes heartbeat in the master on a regular basis.
  • The NameNode broadcasts a heartbeat on the network on a regular basis, and DataNodes respond.
  • DataNodes listed in the dfs.hosts file. The NameNode uses as the definitive list of available DataNodes.

Answer : DataNodes heartbeat in the master on a regular basis.

What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes.


Options are :

  • B,D
  • You don't need to restart any daemon, as they will pick up changes automatically.
  • You must modify the configuration files on each of the six DataNode machines.
  • You must restart the NameNode daemon to apply the changes to the cluster
  • You must modify the configuration files on only one of the DataNode machine
  • You must restart all six DataNode daemons to apply the changes to the cluster.
  • You must modify the configuration files on the NameNode only. DataNodes read their configuration from the master nodes.

Answer : B,D

Identify four characteristics of a 300MB file that has been written to HDFS with block size of 128MB and all other Hadoop defaults unchanged?


Options are :

  • Each block will be replicated nine times
  • The third block will be 64MB
  • All three blocks will be 128MB
  • The file will be split into three blocks when initially written into the cluster
  • C,D,E,F
  • Two of the initial blocks will be 128MB
  • The file will consume 1152MB of space in the cluster
  • Each block will be replicated three times
  • The third Initial block will be 44 MB

Answer : C,D,E,F

What metadata is stored on a DataNode when a block is written to it?


Options are :

  • None. Only the block itself is written.
  • Information on the fileís location in HDFS.
  • Node location of each block belonging to the same namespace.
  • Checksums for the data in the block, as a separate file.

Answer : Checksums for the data in the block, as a separate file.

Your cluster is running Map v1 (MRv1), with default replication set to 3, and a cluster blocks 64MB. Identify which best describes the file read process when a Client application connects into the cluster and requests a 50MB file?


Options are :

  • The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one the client reads as part of Hadoops execution framework.
  • The client queries the NameNode and then retrieves the block from the nearest DataNode to the client and then passes that block back to the client.
  • The client queries the NameNode for the locations of the block, and reads from the first location in the list of receives.
  • The client queries the NameNode for the locations of the block, and reads from a random location in the list it receives to eliminate network I/O loads by balancing which nodes it retrieves data from at any given time.

Answer : The client queries the NameNode for the locations of the block, and reads from the first location in the list of receives.

What is the smallest number of slave nodes you would need to configure in your hadoop cluster to store 100TB of data, using Hadoop default replication values, on nodes with 10TB of RAW disk space per node?


Options are :

  • 75
  • 10
  • 40
  • 25
  • 100

Answer : 40

What does each block of a file contain where it is return into HDFS?


Options are :

  • Each block has a header and footer containing metadata
  • Each block as a header containing metadata
  • Each block writes a separate meta file containing information on the file name of which the block is a part
  • Each block contains only data from the file

Answer : Each block writes a separate meta file containing information on the file name of which the block is a part

A slave node in your cluster has 24GB of Ram and 12 physical processor cores on hyper threading-enabled processor. You set the value of mapped. child.java.opts to Xmx1G, and the value of apred.tasktracker.map.tasks.maximum to 12. What is the appropriate value to set for mapred.Tastracker.reduce.tasks.maximum?


Options are :

  • 24
  • 12
  • 2
  • 16
  • 20
  • 6

Answer : 12

Your cluster implements HDFS High Availability (HA). You two NameNodes are named nn01 and nn02. What occurs when you execute the command: Hdfs haadmin -failover nn01 nn02


Options are :

  • Nn01 is fenced, and nn02 becomes the active NameNode
  • Nn01 is fenced, and nn01 becomes the active NameNode
  • Nn01 becomes the standby NameNode and nn02 becomes the active NameNode
  • nn02 becomes the standby NameNode and nn02 becomes the active NameNode

Answer : Nn01 is fenced, and nn02 becomes the active NameNode

Identify the function performed by the Secondary NameNode daemon on a cluster configured to run with a single NameNode.


Options are :

  • In this configuration, the Secondary NameNode performs a checkpoint operation on the files by the NameNode.
  • In this configuration, the Secondary NameNode servers as alternate data channel for clients to reach HDFS, should the NameNode become too busy.
  • In this configuration, the Secondary NameNode is standby NameNode, ready to failover and provide high availability.
  • In this configuration, the Secondary NameNode performs deal-time backups of the NameNode.

Answer : In this configuration, the Secondary NameNode performs a checkpoint operation on the files by the NameNode.

Comment / Suggestion Section
Point our Mistakes and Post Your Suggestions