Assignment 6: Analyzing a Social Network
Objectives
- to implement breadth-first search
Introduction
Social networks are complex, data-rich entities in which many different fields, ranging from anthropology to biology, take special interest. Their study and use are truly staggering—you've probably engaged with a service within the past hour that leverages, in some way, information gleaned from a digital implementation of such an entity (possibly for the service's profit). In this assignment, we'll be modeling a social network using an undirected graph, which is a type of graph for which associativity is the defining relationship between entities––not directionality. More specifically, we'll be probing two different queries:- to which degree one member of the social network is separated from another
- the identities of the mutual points of connection (i.e. names of friends) that exist between two members.
Assignment
There are two parts to this assignment:- modifying the
graphADT according to the specification in/c/cs223/hw6/Required/graph.h; and - writing a program called
Friendsthat can parse input containing a representation of a graph and a list of names and output values for the two probes listed above.
Implementing the graph ADT
Complete the implementation of the functions declared
in /c/cs223/hw6/Required/graph.h
to implement a graph that performs each of the following functions in accordance with their respective bounds:
undigraph_create: $O(1)$undigraph_has_vertex: expected $O(1)$, worst-case $O(V)$undigraph_add_edge: expected $O(1)$, worst-case $O(V + E)$undigraph_has_edge: expected $O(1)$, worst-case $O(V + E)$undigraph_bfs_distance: expected $O(V + E)$undigraph_common_neighbors: expected $O(D' * S' + D'' * S'')$undigraph_destroy: expected $O(V + E)$
malloc and
free
run in $O(1)$ time. Note that, in the above statements, V represents the number of
vertices in the graph, E the number of edges in the graph, D
the degree of one particular vertex, and S the length of the longest string connected
to the relevant vertex. For the purposes of this assignment, we let S be bounded by a
constant, and we ignore it.
Your ADT will be subjected to a gauntlet of unit tests that
examine individual functions. An incomplete suite of unit tests is implemented
in /c/cs223/hw6/Required/graph_unit.c,
which will be replaced
with a complete test suite for the private test script.
Your makefile must include a target GraphUnit that is built
from those two files and your implementation of the graph
module.
All unit tests must pass with no errors reported from valgrind,
and any use of your graph module that doesn't violate
the functions' preconditions must result in no errors from valgrind.
No public or private tests will violate the preconditions.
Note that although you may not use all of the functions provided by the
graph module when writing the Friends program,
you must implement all of them for this part of the assignment.
Writing the Friends program
Use your graph ADT implementation to write a program that performs
the two functions specified above.
Input
The command-line arguments will be as follows:- The first command-line argument will be the name of the file representing the graph
-
The second command-line argument will be one of the following keywords:
distance,mutual, andboth
More specifically, the relevant file, here referred to
as inputs, will contain a number of lines
each with two non-empty strings of characters, or names, separated by a single comma.
Each name represents a vertex of the
graph (a person in the social network), and each line represents an
edge ("friendship"). Each edge (that is, vertex-vertex combination, not permutation)
is guaranteed to be unique, and each name is guaranteed to consist of 20 or fewer characters.
Edges in inputs cannot extend from a vertex to itself.
Each query will be supplied from standard input in the same manner as name pairs
above (comma-delimited, each less than or equal to 20 characters in length), giving names of
people on which one (or both) of the two tasks are to be performed. Each name must also be
present in
inputs. Both
input sources will be
terminated by a newline character, and there will not be any other whitespace throughout,
aside from other implicitly necessary newline characters. Commas will not appear in any names.
Output
If the second command-line argument isdistance, the Friends program will then
output for each query q:
- the path length between the two
names inq, inclusive of the second - a newline character
If the second command-line argument is mutual, the Friends program will then
output for each query q:
- a space-delimited list of each
nameconnecting the twonames inq, sorted in order of increasing comparison values as dictated bystrcmp - a newline character
If the second command-line argument is both, then the output should be the specified output for
distance, followed by the specified output for mutual, for each query
supplied in standard input.
The output should be formatted as in the example below. Note that, if distance is specified and
some query consists of vertices between which there exists no path, the string
v, v' are not connected should be printed to standard output followed by a
newline character. Similarly, if two identical vertices are provided in some query
under distance the string v, v' are the same person should be printed to
standard output followed by a newline character.
Similarly, if mutual is specified
and some query consists of names that have no mutual friends, the string
v, v' have no mutual friends should be printed to standard output followed
by a newline character. If any of the input
(standard input, inputs, or the remaining command-line argument) is invalid for any reason, then the output can be anything
and your program must not crash or hang.
Note that the required runtime bounds given above are specified for the graph ADT. If we wish to express required runtime as a function of graph size and queries, the bounds are necessarily different.
Several files have been made available to you in /c/cs223/hw6/Required/ and
/c/cs223/hw6/Optional/. In particular, consider the incomplete graph.c,
which features at its heart an adjacency matrix. You will probably want to change what's given iteratively,
gradually working towards a solution that satisfies all of the specified time bounds. Additionally,
note that gmap is a dependency of graph.c, but only its header file accompanies. Finally,
you may wish to incorporate gqueue, an ADT implementing a generic queue, into your submission in some
capacity.
If we let Q be the number of provided queries, Friends must run in expected $O(Elog(E) * Q)$ if mutual is given as the second
command-line argument and in expected $O((E + V) * Q)$if distance is given as the second command-line argument.
Examples
[crb84@turtle hw6]$ cat tests/graph_alpha_subgraphs_2.in A,B B,C D,E [crb84@turtle hw6]$ cat tests/query_alpha_4.in A,B A,C B,D B,E [crb84@turtle hw6]$ ./Friends tests/graph_alpha_subgraphs_2.in distance < tests/query_alpha_4.in A, B have minimum path length 1 A, C have minimum path length 2 B, D are not connected B, E are not connected [crb84@turtle hw6]$ cat tests/graph_alpha_mini_11.in A,B B,C A,C A,F F,B F,E E,D B,D C,D D,G G,E B,E [crb84@turtle hw6]$ cat tests/query_alpha_3.in A,E C,F B,G [crb84@turtle hw6]$ ./Friends tests/graph_alpha_mini_11.in both < tests/query_alpha_3.in A, E have minimum path length 2 A, E have mutual friends B F C, F have minimum path length 2 C, F have mutual friends A B B, G have minimum path length 2 B, G have mutual friends D E
Submissions
Submit your makefile, log file, and source code necessary to build the executablesFriends and GraphUnit using your makefile.
Do not submit graph.h or graph_unit.c, as you may not change those files.
Your makefile should assume that the files from
/c/cs223/hw6/Required have been copied into
the current directory but not that they are still in their original locations,
and you should not submit those files.