CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 1

On a cluster running MapReduce v1 (MRv1), a MapReduce job is given a directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?


Options are :

  • 10
  • 1
  • We cannot say; the number of Mappers is determined by the developer
  • 30

Answer : 30

CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 1

Using cloudera manager on CDH4 cluster running mapreduce V1(MRv1), you delete a tasktracker role instance from a host that also a runs a datanode role instance and a region server role instance.cloudera Manager make changes to the cluster and prompts you to the accept the changes. What other configuration option will cloudera manager automatically prompt you to change?


Options are :

  • the option to immediately rebalance the cluster
  • The option to change java maximum heap sizes for the other role instances
  • the option to specify an alternate slave host to place the received data node role instance
  • The option to failover to the instance by namenode

Answer : the option to specify an alternate slave host to place the received data node role instance

You are running two Hadoop clusters(cluster1 and cluster2), they run identical versions of hadoop. You want to copy the data inside /home/foo/cluster1 to cluster2 into the directory /home/bar/ What is the correct distcp syntax to copy one directory tree from one cluster to the other cluster?


Options are :

  • $ distCp cluster1:/home/foo cluster2:/home/bar/
  • $ Hadoop distCp cluster1:/home/foo cluster2:/home/bar/
  • $ distCp hdfs://cluster1/home/foo hdfs://cluster2/home/bar/
  • $ hadoop distCp hdfs://cluster1/home/foo hdfs://cluster2/home/bar/

Answer : $ hadoop distCp hdfs://cluster1/home/foo hdfs://cluster2/home/bar/

MapReduce V2 (MRv2/YARN) splits which two major functions of the jobtracker into separate daemons?


Options are :

  • Resource management
  • MapReduce metric reporting
  • E,F
  • Managing tasks
  • Job coordination between the resource manager and the node manager
  • Job scheduling/monitoring
  • Health status check (heartbeats)
  • Managing file system metadata
  • Launching tasks

Answer : E,F

CCA-410 Certified Administrator Apache Hadoop CDH4 Exam Set 1

A client application opens a file write stream on your cluster. Which two metadata changes occur during a file write?


Options are :

  • the change is written to the secondary namenode
  • the metadata in Ram on the namenode is flushed to disk
  • the namenode triggers a block report to update block locations in the edits file
  • the change is written to the fsimage file
  • C,E
  • The change is written to the namenode disk
  • The change is written to the edits file
  • The metadata in a Ram on the name node is updated

Answer : C,E

What is the best disk configuration for slave nodes in hadoop cluster where each node has 6x2TB drives?


Options are :

  • Six separate volumes
  • Three RAID 1 arrays
  • A RAID 5 array
  • A single Linux LVM (Logical volume Manager) volume

Answer : Six separate volumes

Compare the hardware requirements of the NameNode with that of the DataNodes in a Hadoop cluster running MapReduce v1 (MRv1):


Options are :

  • The NameNode and DataNodes should the same hardware configuration.
  • The NameNode requires more memory but less disk capacity.
  • The NameNode requires less memory and less disk capacity than the DataNodes.
  • The NameNode requires more memory and requires greater disk capacity than the DataNodes.
  • The NameNode requires more memory and no disk drives.

Answer : The NameNode requires more memory but less disk capacity.

You configure your hadoop development cluster with both mapreduce frameworks, mapreduceV1 (MRv1) and mapreduce v2 (MRv2/YARN). You plan to run only one set of mapreduce daemons at a time in this development environment (running both simultaneously results in an unsuitable cluster but configure in both moving between them is fine). Which two mapreduce (capitation) daemons do you need to configure to run on your masternodes?


Options are :

  • Applicationmaster
  • Containermanager
  • E,F
  • Nodemanager
  • Jobtracker
  • Journalnode
  • Resourcemanager

Answer : E,F

Your Hadoop cluster contains nodes in three racks. Choose which scenario results if you leave the dfs.hosts property in the NameNodes configuration file empty (blank)?


Options are :

  • Any machine running the DataNode daemon can immediately join the cluster.
  • No new can be added to the cluster until you specify them in the dfs.hosts file.
  • The NameNode will update dfs.hosts property to include machines running the DataNode daemon on the next NameNode reboot or with a dfsadmin refreshNodes.
  • Presented with a blank dfs.hosts property, the NameNode will permit DataNodes specified in mapred.hosts to join the cluster.

Answer : Any machine running the DataNode daemon can immediately join the cluster.

Which two daemons must be installed on master nodes in hadoop cluster running mapreduce V1 (MRV1)?


Options are :

  • B,E
  • ApplicationMaster
  • Tasktracker
  • Namenode
  • Datanode
  • Resourcemanager
  • Hmaster
  • Zookeeper

Answer : B,E

What occurs when you run a hadoop job specifying output directory for job output which already exists in HDFS?


Options are :

  • An error will occur immediately because the output directory must not already exist.
  • an error will occur after the mappers have completed but before any reducers begin to run because the output path must not exist during the shuffle and sort.
  • the job will run successfully. Output from the reducers will override the contents of existing directory
  • the job will run successfully output from the reducers will be placed in a directory called job output -1

Answer : An error will occur immediately because the output directory must not already exist.

Choose which best describe a Hadoop cluster's block size storage parameters once you set the HDFS default block size to 64MB?


Options are :

  • The block size of files in the cluster can be determined as the block is written.
  • The block size of files in the Cluster will all be multiples of 64MB.
  • The block size of files in the cluster will all be the exactly 64MB.
  • The block size of files in the duster will all at least be 64MB.

Answer : The block size of files in the cluster will all be the exactly 64MB.

Identity four pieces of cluster information that are stored on disk on the NameNode?


Options are :

  • B,C,E,F
  • An edit log of changes that have been made since the last snapshot compaction by the Secondary NameNode.
  • A catalog of DataNodes and the blocks that are stored on them.
  • Names of the files in HDFS.
  • The directory structure of the files in HDFS.
  • The status of the heartbeats of each DataNode.
  • File permissions of the files in HDFS.
  • An edit log of changes that have been made since the last snapshot of the NameNode.

Answer : B,C,E,F

Which MapReduce daemon instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v1 (MRv1)?


Options are :

  • DataNode
  • JobTracker
  • ApplicationMaster
  • NodeManager
  • TaskTracker
  • NameNode
  • ResourceManager

Answer : TaskTracker

How must you format the underlying filesystem of your Hadoop clusters slave nodes running on Linux?


Options are :

  • They may be formatted in nay Linux filesystem
  • They must be formatted as either ext3 or ext4
  • They must be formatted as HDFS
  • They must not be formatted - - HDFS will format the filesystem automatically

Answer : They may be formatted in nay Linux filesystem

Using hadoops default settings, how much data will be able to store on your hadoop cluster if it is has 12 nodes with 4TB raw diskspace per node allocated to HDFS storage?


Options are :

  • Approximately 48TB
  • Approximately 3TB
  • Approximately 16TB
  • Approximately 12TB

Answer : Approximately 16TB

In a cluster configured with HDFS High Availability (HA) but NOT HDFS federation, each map task run:


Options are :

  • In the same Java Virtual Machine as the DataNode.
  • In the same Java Virtual Machine as the JobTracker.
  • In the same Java Virtual Machine as the TaskTracker
  • In its own Java Virtual Machine.

Answer : In its own Java Virtual Machine.

Identify which two daemons typically run each slave node in a Hadoop cluster running MapReduce v1 (MRv1)


Options are :

  • B,C
  • DataNode
  • JobTracker
  • NameNode
  • TaskTracker
  • NodeManager
  • Secondary NameNode

Answer : B,C

You set mapred.tasktracker.reduce.tasks.maximum to a value of 4 on a cluster running mapreduce v1 (MRv1). How many reducers will run for any given job?


Options are :

  • a maximum of 4 reducers, but the actual number of reducers that run for any given job is based on the volume intermediate data.
  • Four reducers will run. Once set by the cluster administrator, this parameter cant be overridden
  • Maximum of 4 reducers, but the actual number of reducers that run for any given job is based on the volume of input data
  • The number of reducerís for any given job is set by the developer

Answer : a maximum of 4 reducers, but the actual number of reducers that run for any given job is based on the volume intermediate data.

Which MapReduce v2 (MR2/YARN) daemon is a per-machine slave responsible for launching application containers and monitoring application resources usage?


Options are :

  • ApplicationMasterService
  • JobTracker
  • NodeManager
  • TaskTracker
  • ResourceManager
  • ApplicationMaster

Answer : ApplicationMaster

You are planning a Hadoop duster, and you expect to be receiving just under 1TB of data per week which will be stored on the cluster, using Hadoop's default replication. You decide that your slave nodes will be configured with 4 x 1TB disks. Calculate how many slave nodes you need to deploy at a minimum to store one year's worth of data.


Options are :

  • 10 slave nodes
  • 100 slave nodes
  • 50 slave nodes
  • 100 slave nodes

Answer : 50 slave nodes

Your cluster Mode size is set to 128MB. A client application (client application A) is writing a 500MB file to HDFS. After client application A has written 300MB of data, another client (client application B) attempts to read the file. What is the effect of a second client requesting a file during a write?


Options are :

  • Client application B returns an error
  • Client application on B can read the 300MB that has been written so far.
  • Client application B must wait until the entire file has been written, and will then read its entire contents.
  • Application B can read 256MB of the file

Answer : Application B can read 256MB of the file

Your Hadoop cluster has 25 nodes with a total of 100 TB (4 TB per node) of raw disk space allocated HDFS storage. Assuming Hadoop's default configuration, how much data will you be able to store?


Options are :

  • Approximately 33 TB
  • Approximately 100TB
  • Approximately 10TB
  • Approximately 25TB

Answer : Approximately 33 TB

What does CDH packaging do on install to facilitate Kerberos security setup?


Options are :

  • Creates directories for temp, hdfs, and mapreduce with correct permissions.
  • Automatically configure permissions for log files at $MAPPED_LOG_DIR/userlogs
  • Creates users for hdfs and mapreduce to facilitate role assignment.
  • Creates a set of pre-configured Kerberos keytab files and their permissions.
  • Creates and configures you kdc with default cluster values.

Answer : Creates users for hdfs and mapreduce to facilitate role assignment.

The most important consideration for slave nodes in a Hadoop cluster running production jobs that require short turnaround times is:


Options are :

  • The ratio between the number of processor cores and number of disk drives.
  • The ratio between the number of processor cores and total storage capacity.
  • The ratio between the amount of memory and the total storage capacity.
  • The ratio between the number of processor cores and the amount of memory.
  • The ratio between the amount of memory and the number of disk drives.

Answer : The ratio between the number of processor cores and number of disk drives.

Which three processes does HDFS High Availability (HA) enable on your cluster?


Options are :

  • Write data to two clusters simultaneously
  • Configure unlimited hot standby NameNode.
  • Shut one NameNode down for maintenance without halting the cluster
  • Automatically 'fail over' between NameNodes if one goes down
  • Manually 'fail over' between NameNodes
  • A,C,D

Answer : A,C,D

Choose three reasons why should you run the HDFS balancer periodically?


Options are :

  • To improve data locality for MapReduce tasks.
  • To ensure that all blocks in the cluster are 128MB in size.
  • To help HDFS deliver consistent performance under heavy loads.
  • To ensure that there is capacity in HDFS tor additional data.
  • To ensure that there is consistent disk utilization across the DataNodes.
  • A,B,E

Answer : A,B,E

Which command does Hadoop offer to discover missing or corrupt HDFS data?


Options are :

  • Hadoop does not provide any tools to discover missing or corrupt data; there is no need because three replicas are kept for each data block.
  • Dskchk
  • Fsck
  • Du
  • The map-only checksum utility,

Answer : Fsck

Which daemons instantiates JVMs to perform mapreduce processing in a cluster running mapreduce v1 (MRv1)?


Options are :

  • Nodemanager
  • applicationmanager
  • Tasktracker
  • Datanode
  • Applicationmaster
  • Namenode
  • Jobtracker
  • Resourcemanager

Answer : Tasktracker

How do you access the long messages a mapreduce application generates after you run it on your cluster?


Options are :

  • You remotely log in to any slave node in the cluster to browse your jobs log files in/var/log/hadoop/archive directly
  • You browse the currect working directory of the client machine from which You submitted the job
  • You connect to the job tracker web UI and locate the details for your job this will include long messages
  • You browse the /logs directly in HDFS where all jobs log files as store

Answer : You browse the currect working directory of the client machine from which You submitted the job

Comment / Suggestion Section
Point our Mistakes and Post Your Suggestions