Assignment 6: Analyzing a Social Network
Objectives
- to implement breadth-first search
Introduction
Social networks are complex, data-rich entities in which many different fields, ranging from anthropology to biology, take special interest. Their study and use are truly staggering—you've probably engaged with a service within the past hour that leverages, in some way, information gleaned from a digital implementation of such an entity (possibly for the service's profit). In this assignment, we'll be modeling a social network using an undirected graph, which is a type of graph for which associativity is the defining relationship between entities––not directionality. More specifically, we'll be probing two different queries:- to which degree one member of the social network is separated from another
- the identities of the mutual points of connection (i.e. names of friends) that exist between two members.
Assignment
There are two parts to this assignment:- modifying the
graph
ADT according to the specification in/c/cs223/hw6/Required/graph.h
; and - writing a program called
Friends
that can parse input containing a representation of a graph and a list of names and output values for the two probes listed above.
Implementing the graph
ADT
Complete the implementation of the functions declared
in /c/cs223/hw6/Required/graph.h
to implement a graph that performs each of the following functions in accordance with their respective bounds:
undigraph_create
: $O(1)$undigraph_has_vertex
: expected $O(1)$, worst-case $O(V)$undigraph_add_edge
: expected $O(1)$, worst-case $O(V + E)$undigraph_has_edge
: expected $O(1)$, worst-case $O(V + E)$undigraph_bfs_distance
: expected $O(V + E)$undigraph_common_neighbors
: expected $O(D' * S' + D'' * S'')$undigraph_destroy
: expected $O(V + E)$
malloc
and
free
run in $O(1)$ time. Note that, in the above statements, V
represents the number of
vertices in the graph, E
the number of edges in the graph, D
the degree of one particular vertex, and S
the length of the longest string connected
to the relevant vertex. For the purposes of this assignment, we let S
be bounded by a
constant, and we ignore it.
Your ADT will be subjected to a gauntlet of unit tests that
examine individual functions. An incomplete suite of unit tests is implemented
in /c/cs223/hw6/Required/graph_unit.c
,
which will be replaced
with a complete test suite for the private test script.
Your makefile must include a target GraphUnit
that is built
from those two files and your implementation of the graph
module.
All unit tests must pass with no errors reported from valgrind
,
and any use of your graph
module that doesn't violate
the functions' preconditions must result in no errors from valgrind
.
No public or private tests will violate the preconditions.
Note that although you may not use all of the functions provided by the
graph
module when writing the Friends
program,
you must implement all of them for this part of the assignment.
Writing the Friends
program
Use your graph
ADT implementation to write a program that performs
the two functions specified above.
Input
The command-line arguments will be as follows:- The first command-line argument will be the name of the file representing the graph
-
The second command-line argument will be one of the following keywords:
distance
,mutual
, andboth
More specifically, the relevant file, here referred to
as inputs
, will contain a number of lines
each with two non-empty strings of characters, or name
s, separated by a single comma.
Each name
represents a vertex of the
graph (a person in the social network), and each line represents an
edge ("friendship"). Each edge (that is, vertex-vertex combination, not permutation)
is guaranteed to be unique, and each name
is guaranteed to consist of 20 or fewer characters.
Edges in inputs
cannot extend from a vertex to itself.
Each query
will be supplied from standard input in the same manner as name
pairs
above (comma-delimited, each less than or equal to 20 characters in length), giving name
s of
people on which one (or both) of the two tasks are to be performed. Each name
must also be
present in
inputs
. Both
input sources will be
terminated by a newline character, and there will not be any other whitespace throughout,
aside from other implicitly necessary newline characters. Commas will not appear in any name
s.
Output
If the second command-line argument isdistance
, the Friends
program will then
output for each query
q
:
- the path length between the two
name
s inq
, inclusive of the second - a newline character
If the second command-line argument is mutual
, the Friends
program will then
output for each query
q
:
- a space-delimited list of each
name
connecting the twoname
s inq
, sorted in order of increasing comparison values as dictated bystrcmp
- a newline character
If the second command-line argument is both
, then the output should be the specified output for
distance
, followed by the specified output for mutual
, for each query
supplied in standard input.
The output should be formatted as in the example below. Note that, if distance
is specified and
some query
consists of vertices between which there exists no path, the string
v, v' are not connected
should be printed to standard output followed by a
newline character. Similarly, if two identical vertices are provided in some query
under distance
the string v, v' are the same person
should be printed to
standard output followed by a newline character.
Similarly, if mutual
is specified
and some query
consists of names that have no mutual friends, the string
v, v' have no mutual friends
should be printed to standard output followed
by a newline character. If any of the input
(standard input, inputs
, or the remaining command-line argument) is invalid for any reason, then the output can be anything
and your program must not crash or hang.
Note that the required runtime bounds given above are specified for the graph ADT. If we wish to express required runtime as a function of graph size and queries, the bounds are necessarily different.
Several files have been made available to you in /c/cs223/hw6/Required/
and
/c/cs223/hw6/Optional/
. In particular, consider the incomplete graph.c
,
which features at its heart an adjacency matrix. You will probably want to change what's given iteratively,
gradually working towards a solution that satisfies all of the specified time bounds. Additionally,
note that gmap
is a dependency of graph.c
, but only its header file accompanies. Finally,
you may wish to incorporate gqueue
, an ADT implementing a generic queue, into your submission in some
capacity.
If we let Q
be the number of provided queries, Friends
must run in expected $O(Elog(E) * Q)$ if mutual
is given as the second
command-line argument and in expected $O((E + V) * Q)$if distance
is given as the second command-line argument.
Examples
[crb84@turtle hw6]$ cat tests/graph_alpha_subgraphs_2.in A,B B,C D,E [crb84@turtle hw6]$ cat tests/query_alpha_4.in A,B A,C B,D B,E [crb84@turtle hw6]$ ./Friends tests/graph_alpha_subgraphs_2.in distance < tests/query_alpha_4.in A, B have minimum path length 1 A, C have minimum path length 2 B, D are not connected B, E are not connected [crb84@turtle hw6]$ cat tests/graph_alpha_mini_11.in A,B B,C A,C A,F F,B F,E E,D B,D C,D D,G G,E B,E [crb84@turtle hw6]$ cat tests/query_alpha_3.in A,E C,F B,G [crb84@turtle hw6]$ ./Friends tests/graph_alpha_mini_11.in both < tests/query_alpha_3.in A, E have minimum path length 2 A, E have mutual friends B F C, F have minimum path length 2 C, F have mutual friends A B B, G have minimum path length 2 B, G have mutual friends D E
Submissions
Submit your makefile, log file, and source code necessary to build the executablesFriends
and GraphUnit
using your makefile.
Do not submit graph.h
or graph_unit.c
, as you may not change those files.
Your makefile should assume that the files from
/c/cs223/hw6/Required
have been copied into
the current directory but not that they are still in their original locations,
and you should not submit those files.